Design for Estimating the Net Outcomes of the State Partnership Initiative. Princeton, NJ:...

144
State Partnership Initiatives: Design, Testing and Implementation of the Net Outcomes Evaluation Design for Estimating the Net Outcomes of the State Partnership Initiative: Final Report September 2002 Roberto Agodini Craig Thornton Nazmul Khan Deborah Peikes Submitted to: Social Security Administration Office of Research, Evaluation, and Statistics OP 9th Floor, ITC Building 500 E. Street, SW Washington, DC 20254-0001 Project Officer: Kalman Rupp Task Manager: James Sears Submitted by: Mathematica Policy Research, Inc. 600 Maryland Avenue S.W. #550 Washington, DC 20025-2512 (202) 484-9220 Project Director: Craig Thornton MPR Reference No: 8663-600 SSA Contract No: 0600-96-27333 Task No: 0440-99-38637

Transcript of Design for Estimating the Net Outcomes of the State Partnership Initiative. Princeton, NJ:...

State Partnership Initiatives: Design, Testing and Implementation of the Net Outcomes Evaluation

Design for Estimating the Net Outcomes of the State Partnership Initiative: Final Report September 2002

Roberto Agodini Craig Thornton Nazmul Khan Deborah Peikes

Submitted to: Social Security Administration Office of Research, Evaluation, and

Statistics OP 9th Floor, ITC Building 500 E. Street, SW Washington, DC 20254-0001

Project Officer: Kalman Rupp

Task Manager: James Sears

Submitted by: Mathematica Policy Research, Inc. 600 Maryland Avenue S.W. #550 Washington, DC 20025-2512 (202) 484-9220

Project Director: Craig Thornton

MPR Reference No: 8663-600 SSA Contract No: 0600-96-27333 Task No: 0440-99-38637

ACKNOWLEDGEMENTS

In developing the design for evaluating the State Partnership Initiative, we have benefited substantially from support provided by staff from the Social Security Administration. The project’s initial task manager, Paul O’Leary, guided the initial development process and helped to coordinate the critical early data processing tasks done at SSA. Subsequently, Jim Sears, the project’s final task manager, helped to shape the ultimate design and to organize all the SSA-based data processing required to test the design. Thuy Ho with assistance from Jeff Shapiro very ably conducted that data processing, particularly the numerous computer jobs required to test the iterative beneficiary matching process. This report literally could not have been completed without her efforts. In addition, Minh Huynh, Mary Barbour, Mike Abramo, and Joel Packman helped immensely by providing data extracts and running computer programs.

The project team also benefited substantially from the ongoing technical advice of our

colleague Peter Schochet. Even more important was the work done by our great team of programmers who processed the more than 800 gigabytes of data used to assess the core net-outcomes evaluation design: Kate Bartkus, Miriam Loewenberg, Nora Paxton, and Rachel Sullivan. Finally, Vinita Jethwani helped to compile information about State Project activities and the policy context within which the initiative operates.

We have also benefited from the advice of the project’s Technical Evaluation Support

Group. This group reviewed our design and provided comments on earlier versions of this report. The group included the following individuals:

• Natalie Funk, Office of Employment Support Programs, Social Security

Administration

• Lex Frieden, The Institute for Rehabilitation and Research

• Alan Krueger, Department of Economics and Woodrow Wilson School of Public Policy, Princeton University

• Robert Moffitt, Department of Economics, The Johns Hopkins University

• Kalman Rupp, Office of Research, Evaluation, and Statistics, Social Security Administration

• Charles Scott, Office of Research, Evaluation, and Statistics, Social Security Administration

• Mark Shroder, Office of Policy Development, U.S. Department of Housing and Urban Development

• Jack Worrall, Department of Economics, Rutgers University

• Anthony Young, NISH, formerly the National Industries for the Severely Handicapped

The report was edited by Walt Brower and Roy Grisham and produced by William Garrett

and Sharon Clark.

CONTENTS

Chapter Page

EXECUTIVE SUMMARY..........................................................................................xiii

I THE STATE PARTNERSHIP INITIATIVE AND ITS EVALUATION..................... 1

A. THE STATE PARTNERSHIP INITIATIVE.......................................................... 2

1. Characteristics of the State Project Interventions ........................................... 3 2. Characteristics of State Project Participants.................................................... 5

B. POLICY CONTEXT............................................................................................... 7

C. THE FOUR-PART EVALUATION STRATEGY ............................................... 12

1. The Core Net Outcomes Evaluation.............................................................. 14 2. The Supplemental Evaluation Component.................................................... 15 3. State Projects’ Evaluations............................................................................ 15 4. Implementation and Synthesis Analyses....................................................... 17

II SELECTION OF COMPARISON AREAS FOR THE CORE EVALUATION.......... 19 �

A. DEFINITION OF POTENTIAL COMPARISON AREAS.................................. 21 B. METHODS AND DATA USED TO SELECT COMPARISON AREAS ........... 24

1. Population Density ........................................................................................ 25 2. Population Growth ........................................................................................ 25 3. Unemployment Rate...................................................................................... 26 4. Total County Employment ............................................................................ 26 5. Percentage of County Land in Farming ........................................................ 27 6. Presence of Substantial Manufacturing......................................................... 27 7. Public Transportation Use ............................................................................. 27 8. Poverty Rate .................................................................................................. 27 9. Percentage of County Population in Racial/Ethnic Minorities ..................... 27 10. SSI Beneficiary Employment Rate ............................................................... 28 11. Other Area Characteristics Considered ......................................................... 28

C. ESTIMATING WEIGHTS FOR THE COUNTY CHARACTERISTICS ........... 28 D. PRELIMINARY LIST OF COMPARISON COUNTIES .................................... 36

vi

Contents

Chapter Page

E. FINAL SELECTION OF COMPARISON AREAS ............................................. 37 1. California....................................................................................................... 37 2. Illinois............................................................................................................ 37 3. Iowa (SSA).................................................................................................... 37 4. Minnesota ...................................................................................................... 38 5. New Hampshire............................................................................................. 38 6. New Mexico .................................................................................................. 38 7. New York ...................................................................................................... 38 8. North Carolina............................................................................................... 39 9. Ohio............................................................................................................... 39 10. Oklahoma ...................................................................................................... 40 11. Utah ............................................................................................................... 40 12. Vermont......................................................................................................... 40 13. Wisconsin ...................................................................................................... 41

F. CONCLUSIONS AND ROBUSTNESS OF THE SELECTIONS....................... 41

III COMPARISON BENEFICIARY SELECTION PROCESS ........................................ 47 A. HOW COMPARISON GROUPS ARE SELECTED............................................ 48

1. Propensity Score Matching ........................................................................... 48 2. Potential Comparison Group Members......................................................... 50 3. When Characteristics of Potential Comparison Groups Are Measured ........ 51 4. Tests Used to Assess the Similarity of Participants and the Comparison

Groups ........................................................................................................... 51 5. Characteristics Used in the Matching Process .............................................. 53

B. PRELIMINARY ASSESSMENT OF THE MATCHING PROCESS ................. 56 C. TIPS FOR IMPLEMENTING THE MATCHING PROCESS IN THE

FUTURE................................................................................................................ 65 IV COMPUTING NET OUTCOMES AND ASSESSING THE VALIDITY OF THE RESULTS ........................................................................................................ 69

A. METHOD USED TO COMPUTE NET OUTCOMES ........................................ 70 B. ANALYSES USED TO ASSESS THE VALIDITY OF THE RESULTS ........... 73

1. Matching Several Periods Before Enrollment............................................... 74 2. Comparing Our Comparison Group Results to Experimental Results.......... 75 3. Alternative Validity Analyses ....................................................................... 76

vii

Contents

Chapter Page

C. EXAMINING PRE-ENROLLMENT NET OUTCOMES ................................... 76 D. COMPARISON OF PRELIMINARY FINDINGS FROM THE CORE AND

RANDOM ASSIGNMENT EVALUATIONS FOR NEW YORK’S SPI PROJECT .............................................................................................................. 80 1. The New York Experiment ........................................................................... 81 2. Preliminary Estimated Effects: Experimental Results ................................. 83 3. The Similarity of Employment Changes Estimated Using the Core

Evaluation and Experimental Designs Using the Full Treatment Group...... 84 V PROTOCOL FOR IMPLEMENTING THE CORE EVALUATION IN THE FUTURE ........................................................................................................ 87

A. CREATING THE MATCHING FILE .................................................................. 88

1. Verifying Participant SSNs ........................................................................... 89 2. Creating the Finder File................................................................................. 91 3. Obtaining and Processing the SSA Extracts ................................................. 92 4. Merging the Processed Extracts to Create Two Matching Files ................... 92

B. SELECTING COMPARISON GROUPS ............................................................. 93 C. PRODUCING NET OUTCOME ESTIMATES ................................................... 94

D. STATISTICAL POWER OF THE CORE EVALUATION ................................. 95 E. SUGGESTED FUTURE SCHEDULE ................................................................. 97

VI SUPPLEMENTAL EVALUATION POSSIBILITIES................................................. 99

A. PRELIMINARY SUPPLEMENTAL EVALUATION DESIGN......................... 99

1. Analyze Service Delivery and Participation Using State Project Data ....... 100 2. Estimate the Impact of Services Using State Project and Process Data...... 100 3. Refine Core Earnings Estimates Using State Unemployment Insurance

Data for Key State Projects ......................................................................... 101 4. Estimating Intervention Costs and Effects on Tax Payments ..................... 102 5. Methodological Analysis............................................................................. 103

B. SUMMARY OF STATE PROJECT DATA....................................................... 104

1. Overview of the Current Status of the Data Collection System.................. 104 2. Overview of the State Project Data File Structure ...................................... 106

viii

Contents

Chapter Page

VII IMPLEMENTATION AND SYNTHESIS ANALYSES ........................................... 109 A. IMPLEMENTATION ANALYSIS..................................................................... 109

1. State Project Designs................................................................................... 111 2. Project Outreach and Recruitment .............................................................. 111 3. State Project Interventions .......................................................................... 113 4. Project Environment.................................................................................... 114 5. General Assessment of Operational Success............................................... 115 6. State Project Systems Change Activities .................................................... 116

B. SYNTHESIS OF THE EVALUATION FINDINGS.......................................... 117

REFERENCES ...................................................................................................... 121 TABLE OF ACRONYMS .......................................................................................... 125

TABLES

Table Page I.1 SUMMARY OF INTERVENTIONS BEING TESTED BY THE STATE PROJECTS..... 4

I.2 CHARACTERISTICS OF BENEFICIARIES ENROLLED IN SSA-FUNDED AND

RSA-FUNDED STTE PROJECTS (THROUGH DECEMBER 2001) ............................... 6

I.3 STATE PROJECT ENROLLMENTS.................................................................................. 8 I.4 ISSUES ADDRESSED IN THE STATE PARTNERSHIP INITIATIVE EVALUATION .................................................................................................................. 13 II.1 STATES FROM WHICH COMPARISON COUNTIES WERE SELECTED ................. 22 II.2 CHARACTERISTICS USED IN THE COUNTY MATCHING PROCESS.................... 31 II.3 MARGINAL EFFECTS AND WEIGHTS FOR 11 COUNTY CHARACTERISTICS

RELATED TO SSI RECIPIENT EMPLOYMENT........................................................... 34 II.4 SSI BENEFICIARIES’ EMPLOYMENT STATUS IN JUNE 1999 AND IN THE PREVIOUS THREE MONTHS......................................................................................... 35 II.5 ADEQUACY OF THE BENEFICIARY POPULATION IN THE MATCHED COMPARISON AREAS FOR SSI AND SSDI BENEFICIARIES................................... 42 II.6 PERCENT OF COUNTIES SELECTED IN THE REGRESSION-BASED METHOD

ALSO SELECTED BY ALTERNATIVE SELECTION METHODS .............................. 44 II.7 PERCENT OF COUNTIES SELECTED BY STATE PROJECTS ALSO SELECTED BY THE EQUAL-WEIGHT METHOD ............................................................................ 45 III.1 CHARACTERISTICS INITIALLY USED IN THE BENEFICIARY MATCHING PROCESS........................................................................................................................... 59 III.2 SIMILARITY OF PARTICIPANT AND COMPARISON GROUPS .............................. 61 III.3 IOWA PROJECT NUMBER OF SSI PARTICIPANTS AND COMPARISON

MEMBERS IN POPULOUS AREAS BY THE ESTIMATED PROPENSITY SCORE.. 62 III.4 IOWA PROJECT SSI PARTICIPANTS AND COMPARISON GROUP MEMBERS IN POPULOUS AREAS CHARACTERISTICS USED IN THE MATCHING PROCESS .................................................................................................... 63 IV.1 SAMPLE SIZES IN THE NEW YORK PROJECT EXPERIMENT ................................ 82

x

Tables

Table Page V.1 SUMMARY OF THE PROTOCOL FOR IMPLEMENTING THE CORE EVALUATION....................................................................................................... 88 V.2 EXPECTED NUMBER OF PARTICIPANTS AND THE MINIMUM DETECTABLE DIFFERENCES (MDD) FOR THE CORE EVALUATION............................................. 96

FIGURES

Figure Page

III.1 PROCESS USED TO SELECT COMPARISON GROUPS .......................................... 57 IV.1 NEW YORK PROJECT EMPLOYMENT RATES........................................................ 78 IV.2 NEW YORK CITY DEMONSTRATION AND COMPARISON AREA UNEMPLOYMENT RATES .......................................................................................... 79 IV.3 BUFFALO DEMONSTRATION AND COMPARISON AREA UNEMPLOYMENT RATES .......................................................................................... 80 IV.4 NEW YORK PROJECT EMPLOYMENT RATES FOR TREATMENT, CONTROL, AND COMPARISON GROUPS................................................................ 83 IV.5 NEW YORK PROJECT ALTERNATIVE ESTIMATES OF THE NET CHANGE IN EMPLOYMENT RATES FOR THE FULL TREATMENT GROUP ...................... 85 V.1 PROCESS USED TO CREATE THE MATCHING FILES........................................... 90

EXECUTIVE SUMMARY

he State Partnership Initiative (SPI) funded 18 projects in 17 states to investigate new strategies that promote employment among people with disabilities, particularly those who receive disability benefits from either the Supplemental Security Income (SSI) or the

Social Security Disability Insurance (SSDI) programs. The projects share a common goal but differ considerably in their methods. Their approaches include (1) helping beneficiaries understand the ways in which Social Security Administration (SSA) programs encourage work, (2) providing waivers that make work more attractive to beneficiaries, (3) delivering training to improve people’s abilities, (4) persuading employers to hire people with disabilities, and (5) trying to improve the coordination and efficiency of the overall service system.

To understand fully the effects of SPI, SSA developed a four-part evaluation that combines multiple data sets and methods to produce several estimates of the effects the State Project interventions generate. It then uses qualitative information about project implementation to synthesize those estimates and to develop an understanding of the relative performance of the projects. A major goal of this evaluation is to compare the projects to gain a better understanding of which mix of services is the most effective and learn whether services are especially useful for any particular participant groups.

Specifically, the four components of the overall SPI evaluation design are as follows:

��Core Evaluation. This evaluation component will compare key outcomes of the beneficiaries who participated in the State Projects (participants) with outcomes of a comparison group that is selected to match the participants in terms of the characteristics of the areas in which they live, as well as their demographics, prior labor market experiences, and prior benefit receipt. The core evaluation will use only SSA administrative data and can be applied consistently in the 13 State Projects that have provided participants’ Social Security numbers (SSNs) to the evaluation.

��Supplemental Evaluation. The core evaluation can be supplemented with data that the State Projects plan to provide to SSA. These include detailed data about participants’ characteristics, their receipt of project services, and their employment and benefit-receipt outcomes. These data are not yet available for analysis, but enough is known about them to provide several designs for their use in the SPI evaluation. The most important use will be to study the nature, intensity, and duration of services that participants receive.

��State Project Internal Evaluations. Each State Project is conducting its own evaluation. These evaluations can often use state data that will provide more detail than is possible using only the SSA data available in the core evaluation. However, variation among the

T

xiv

Executive Summary

projects in these data sources and in their evaluation methods will make it hard to compare findings across the projects.

��Implementation and Synthesis Analysis. The SPI Project Office at Virginia Commonwealth University (VCU) is documenting and analyzing the interventions fielded by State Projects. In addition, each project will submit a final report about its interventions. This information will help evaluators understand the ways in which State Projects changed the services available to participants, as well as the context within which projects operated. It will also facilitate comparisons among the projects and an overall synthesis of the findings from the other SPI evaluation components.

This report focuses on our efforts to develop and test a design for the core evaluation. Our

design begins by selecting comparison areas that match the areas in which State Projects offered services and then selects matched comparison beneficiaries from those areas. It uses statistical methods to correct for any residual differences between the participants and the matched comparison group. Finally, it includes a validity test that compares results from the core evaluation with those derived from the random assignment experiment the New York State Project is using to test its intervention. We also include a summary of the procedures required to implement the core evaluation. In addition, the report describes designs for two other evaluation components: the supplemental evaluation and the implementation and synthesis evaluation.

While the core evaluation design was thoroughly tested, it is still too early to use the design

to estimate net outcomes for the projects. Our tests used an early sample of participants, and the available follow-up data for that sample cover only the first three months after they enrolled in a project. It would be extremely unlikely that the projects would have had any substantial effects in that short a time. The projects offered services whose effects are likely to be seen only after six or more months. Unlike programs that place participants directly in jobs, the State Projects generally offered services that affect employment more indirectly. For example, the benefits counseling services offered by all projects help beneficiaries understand the available work incentives. That understanding should help employed beneficiaries maintain their jobs and encourage unemployed beneficiaries to seek jobs. But those effects will take a while to materialize, since they rely on the actions participants take after receiving project services.

We recommend that the core evaluation design be implemented in early 2004. At that time,

there would be enough Summary Earnings Record (SER) data to select comparison beneficiaries for all participants. Also, 12 months of follow-up data would be available to measure benefit receipt for all participants and to measure employment and earnings for SSI beneficiaries. In addition, the implementation evaluation should be complete by the end of 2003, so that it can provide the qualitative information required to interpret quantitative findings from the core evaluation. A more complete evaluation that uses the SER data to measure effects on employment and earnings for SSDI beneficiaries and SSI beneficiaries who leave the rolls will have to wait until early 2005, when the full SER data for calendar 2003 becomes available.

xv

Executive Summary

OVERVIEW OF THE STATE PARTNERSHIP INITIATIVE

The 18 State Projects are addressing the persistently low rates of employment among people with disabilities, especially disabled SSA beneficiaries. For example, only 6.7 percent of disabled SSI recipients were employed in December 2000 (Pickett 2001). While this rate has risen slowly over the past 25 years (from 3.4 percent in 1976), it has been largely stable since 1987. In addition, almost two-thirds of working SSI recipients earn less than the amount SSA designates as substantial gainful activity (currently $780 a month). Furthermore, among disabled SSI recipients who work, use of SSA’s current work-incentive programs is low.

The overall goal of the SPI evaluation is to determine the extent to which State Projects’

interventions improve economic independence among SSA’s disabled beneficiaries. Specifically, how much do these interventions increase average earnings and decrease dependence on SSI and SSDI benefits? A key feature of the initiative is the variation among State Projects with respect to the services and policies they are implementing. This variation will enable the evaluation to study which combinations of services appear to be most effective and which groups of participants seem to benefit the most.

SSA and the Rehabilitation Services Administration (RSA) have provided most of the

funding for SPI. SSA funded 12 projects through cooperative agreements and RSA funded 6 as systems change grants. Supplementary funding and support have also been provided by the U.S. Department of Labor (DOL) and the Substance Abuse and Mental Health Services Administration (SAMHSA). Finally, many states have provided services that supplemented those funded through SPI.

These 18 State Projects seek to increase employment among people with disabilities by

improving information about current work incentives, instituting new work incentives, providing better access to vocational supports, working with employers, and changing service systems so as to place greater emphasis on employment outcomes. Most State Projects are targeting beneficiaries with severe mental illness, although many projects target other groups, including disabled students and people with HIV/AIDS. The State Projects began in fall 1998, initiated participant enrollment in 1999, and, with one exception, are expected to continue serving participants through fall 2003.

Through December 2001, the State Projects have enrolled more than 6,500 participants

(4,284 in the SSA-funded projects and 2,300 in the RSA-funded ones). Projects have sent the SPI Project Office intake data for all these participants. Further, all 12 SSA-funded projects and the RSA-funded project in Utah have reported their participants’ SSNs to the national evaluation and can therefore be included in the core evaluation. Enrollment is expected to continue at least through September 2002, so the total should exceed 9,000 with current rates of enrollment (with over 6,000 expected in the 13 projects that have provided SSNs).

OVERVIEW OF THE SPI EVALUATION DESIGN

The main conclusion of this report is that the core evaluation design is feasible, that it can produce accurate estimates of the extent to which State Projects change the employment, earnings, benefit receipt, and total income of participants. Furthermore, the process of identifying comparison counties and comparison beneficiaries has been thoroughly tested and

xvi

Executive Summary

appears to produce valid comparison groups for the core evaluation. However, this selection process requires the extraction and processing of substantial data from SSA’s administrative system, as well as a great deal of researcher input.

Comparison Counties Are Well Matched to the Demonstration Counties

We selected comparison counties on a county-by-county basis. For each demonstration county, we ranked, in terms of their similarity with the demonstration county, nearby counties where project services were not offered. To match counties, we used 13 county-level characteristics that affect employment, including population density and growth, unemployment rate, prominence of farming and manufacturing, use of public transportation, poverty rate, county demographics, and pre-demonstration employment rate of SSI beneficiaries. In ranking counties, we weighed these factors to reflect their relative importance in explaining employment among SSI beneficiaries. Using data on the employment and characteristics of a total of 750,000 SSI beneficiaries, we estimated (for each State Project) separate sets of weights for these characteristics. As a final step, we asked State Project staff to review our initial selection of comparison counties and delete those that had service environments substantially different from those in the demonstration counties.

Overall, the basic elements of this approach appear to work. We were able to select

comparison counties similar to all the demonstration counties included in the core evaluation. Furthermore, the comparison counties contain enough beneficiaries to form a good pool from which evaluators can select a comparison group that matches the beneficiaries in the demonstration.

Comparison Beneficiaries Can Be Well Matched to Participants

Our process for selecting comparison beneficiaries uses SSA data to find people who are similar to participants along many important characteristics. For SSI beneficiaries, the characteristics include 7 demographics, 24 months of pre-enrollment information for 9 outcomes, 5 calendar year measures of pre-enrollment employment, 5 calendar year measures of pre-enrollment earnings, and 13 area characteristics—a total of nearly 250 variables. The list includes most of the characteristics that the literature has found to be related to the outcomes that will be analyzed as part of the evaluation. For beneficiaries who only receive SSDI, the number of characteristics is smaller, but it still includes the most important variables, such as pre-enrollment employment and earnings. Statistical matching using propensity scores is used to select comparison groups.

Our preliminary assessment of the matching process for four of the State Projects indicates

that it will produce comparison groups that are well matched to participants; however, this process must be implemented on SSA’s mainframe computer. Creating the files used by the process for matching beneficiaries and by the method for producing net outcomes requires dozens of different computer programs to extract and process several million records. Some data processing took days to complete. The evaluation had to use SSA’s mainframe because (1) only SSA staff can use the calendar year employment and earnings information that the design needs, and (2) the mainframe processes the data much faster than personal computers can.

xvii

Executive Summary

Preliminary Validity Tests Suggest That the Core Evaluation Will Produce Reliable Findings

In the interest of providing SSA policymakers with accurate evidence of the effect of the State Projects, we have designed two analyses to assess whether the core evaluation design will produce valid results. The first validity analysis selects comparison groups for participants several periods before enrollment and then examines net outcomes during the periods after that earlier date, but prior to enrollment. If our matching process is accurate, net outcomes during this period should equal zero, because neither participants nor comparison group members received any project services. The second validity analysis compares the comparison group selected using the core evaluation design with the control group selected using random assignment in the New York project’s experiment. Because experimental methods are widely regarded as the benchmark for estimating the effect of an intervention, the results of this comparison will provide strong evidence of the validity of the matching process developed for the core evaluation.

We conducted a preliminary implementation of these validity analyses based on an early

cohort of participants in the New York project. The analysis was based on the New York project, because it is the only one that is conducting a randomized experiment and has enrolled enough beneficiaries to support the analysis.

The results suggest that our design will provide accurate evidence of the effect of the State

Projects. However, the preliminary nature of the currently available data means that the tests should be conducted again with the full set of participants and a longer follow-up period.

Implementing the Core Evaluation Will Require Substantial Decision-Making by Researchers

Based on our experience of testing the evaluation design, updating the short-term net outcomes requires a great deal of data processing and researcher input. For example, if the core evaluation is run in early 2004, it will require more than a month of programmer time to obtain data from the more than 30 SSA data sets needed for updating the evaluation data files. It would then take two to three months of time from a researcher and programmer team to select the matched comparison groups. This last step involves examining results from hundreds of statistical tests and modifying the selection process if those tests indicate that the initially selected comparison group is not suitable.

While we have developed computer programs to implement the core evaluation design,

evaluators will have to review a few aspects of those programs. In particular, they will have to ensure that the formats and content of SSA data extracts have not changed, and make judgments about the specification of the process used to select comparison groups.

All Components of the Four-Part Evaluation Design Have Been Implemented

In addition to the core evaluation described in this report, there has been substantial progress on the other evaluation components. The major activity supporting the supplemental evaluation has been data collection. All but one State Project have developed participant-tracking systems

xviii

Executive Summary

and are providing the evaluation with data about participant characteristics, use of project services, reliance on benefit programs, and employment. Data about participant characteristics are already available from these systems and provide a clear description of the people who have enrolled. The projects and Project Office are still refining the systems that collect data on service use, benefit receipt, and employment, but they appear confident that these systems will produce reliable data in 2003 for most projects.

All the SSA-funded State Projects have developed evaluation designs for their internal

evaluations. They have begun to build the required data sets, and several states have already used data from their state unemployment insurance systems to produce preliminary analyses of net effects on employment and earnings. Initially, the RSA-funded states had not been required to conduct evaluations, so their evaluation plans lag behind those of the SSA-funded states. Nevertheless, they have begun to develop plans to document their interventions and conduct at least qualitative assessments of their results.

Finally, the states and the Project Office have begun the implementation analysis. All the

SSA-funded projects have provided initial descriptions of their interventions and plan to update those descriptions in 2003. The Project Office has conducted cross-project analyses of three key components: the provision of benefits counseling, the coordination of services through the DOL One-Stop Centers, and the state’s provision of Medicaid Buy-In programs that allow people with disabilities to purchase health insurance through the Medicaid program. A final round of data collection and analysis, scheduled for 2003, should provide enough information to conduct the planned implementation and synthesis evaluations.

Most Projects Will Enroll Enough Participants to Support Implementing the Core Evaluation

It appears that most projects will enroll enough participants that the methods developed for the core evaluation will be able to detect policy-relevant impacts. For example, the evaluation should be able to detect a change in annual employment rates of 4.4 percentage points for the 942 SSI beneficiaries the New York project expects to enroll. Overall, it should be possible to detect increases in employment of 14 percentage points or more in most projects. Such increases are similar to those observed for SSA’s Transitional Employment Training Demonstration (TETD), which increased employment rates by 9 to 14 percentage points over the 6 years following participants’ enrollment (an approximate doubling of employment rates relative to that project’s control group).

The power of the evaluation is greater when states are pooled. For example, if the

evaluation combines the samples from the three states that implemented SSI waivers (California, New York, and Wisconsin), the evaluation should be able to detect increases in employment of just 3 percentage points. Similarly, if the samples for all states were combined, the resulting sample of approximately 6,000 beneficiaries should be able to detect increases of less than 2 percentage points. Such a combined analysis would indicate the overall extent to which a policy that gave states the general mandate and funding provided by SPI would increase beneficiary employment and effect other key outcomes.

I

THE STATE PARTNERSHIP INITIATIVE AND ITS EVALUATION

he State Partnership Initiative (SPI) includes 18 projects in 17 states that are investigating new strategies to promote employment among people with disabilities, primarily those who receive disability benefits from either the Supplemental Security Income (SSI) or the

Social Security Disability Insurance (SSDI) programs. Most State Projects began enrolling participants in 1999 and will continue to provide services into 2003. The Initiative also included a project office to provide technical assistance to the State Projects and collect data about implementation. Finally, SPI included the development of the cross-state net outcomes evaluation described in this report.

The overall goal of SPI is to determine the extent to which the services provided by the State

Projects improve economic independence among people with disabilities, particularly those receiving SSI or SSDI benefits. Specifically, how much do the services increase earnings and decrease dependence on benefits? A key feature of SPI is the variation among the State Projects with respect to the services and policies they are implementing. This variation provides an opportunity to study which combinations of services appear to be most effective and which groups of participants seem to benefit the most.

SSA developed a four-part evaluation strategy for SPI. This strategy will use multiple data

sets and methods to produce several estimates of the effects generated by the State Project interventions. It will then use qualitative information about project implementation to synthesize those estimates and to develop an understanding of the relative performance of the projects. Specifically, the four components are as follows:

��Core Evaluation. This evaluation component will compare outcomes for the beneficiaries who participated in the State Projects (participants) with outcomes for a comparison group that is selected to match the participants in terms of demographics, prior employment and earnings, and prior benefit receipt. The design uses only Social Security Administration (SSA) administrative data and can be applied consistently in all the State Projects that provide participants’ Social Security numbers (SSNs) to the evaluation. The design begins by selecting comparison areas that match the areas in which State Projects offered services (Chapter II). It then selects matched comparison beneficiaries from those selected areas (Chapter III). This design appears to produce valid results based on several tests, including comparison of findings with the random-assignment experiment fielded by the New York State Project (Chapter IV). The procedures, data files, and computer programs required to implement the design are summarized in Chapter V with full details provided in a separate report (Khan et al. 2002).

T

2

I. The State Partnership Initiative and Its Evaluation

��Supplemental Evaluation. The core evaluation can be supplemented using data that the State Projects plan to provide to SSA. These include detailed data about participants’ characteristics, their receipt of project services, and their employment and benefit-receipt outcomes. These data are not yet available for analysis, but enough is known about them to provide several designs for their use in the SPI evaluation (Chapter VI).

��State Project Internal Evaluations. Each State Project is conducting its own evaluation. These evaluations can often use state data that will provide more detail than is possible using only the SSA data used in the core evaluation. However, variation among the projects in these data sources and in their evaluation methods will make it hard to compare findings across the projects. Most of the projects have submitted evaluation designs (which are summarized later in this chapter) and expect to produce findings late in 2003.

��Implementation and Synthesis Analysis. The SPI Project Office at Virginia Commonwealth University (VCU) is documenting and analyzing the interventions fielded by the State Projects. In addition, each project will submit a final report about its interventions. This information will help evaluators understand the ways in which State Projects changed the services available to participants and the context within which projects operated (Chapter VII). It will also facilitate comparisons among the projects and an overall synthesis of the findings from the other SPI evaluation components.

A. THE STATE PARTNERSHIP INITIATIVE

SPI represents the first multi-agency effort under Executive Order 13078, which in March 1998 established the President’s Task Force on Employment of Adults with Disabilities. This task force brought together federal departments and agencies to support the goals of the Americans with Disabilities Act. Its mission is to evaluate existing federal programs in order to determine what changes, modifications, and innovations may be necessary to remove barriers that inhibit adults with disabilities from becoming gainfully employed.

SSA and the Rehabilitation Services Administration (RSA) have taken the lead in funding

and directing SPI, which includes 18 projects in 17 states.1 The U.S. Department of Labor (DOL) and the Substance Abuse and Mental Health Services Administration (SAMHSA) have also provided supplementary funding and support for SPI. These 18 State Projects (12 cooperative agreements funded by SSA, 6 systems change grants by RSA) seek to increase employment among people with disabilities by improving information about current work incentives, instituting new work incentives, providing better access to vocational supports, working with employers, and changing service systems so as to place greater emphasis on employment outcomes. Most State Projects are targeting beneficiaries with severe mental

1The RSA initiative focused on activities aimed at changing the overall system that helps people with disabilities obtain employment and live independently. As a result, the overall SSA/RSA effort is sometimes referred to as the State Partnership and Systems Change Initiative, although the acronym SPI is still used.

3

I. The State Partnership Initiative and Its Evaluation

illness, although many target other groups, including disabled students and people with HIV/AIDS. The State Projects began in fall 1998 and are expected to continue through fall 2003.

1. Characteristics of the State Project Interventions

The State Projects developed a variety of ways to promote employment among disabled beneficiaries (see Table I.1). In particular, they tended to offer services designed to address the following matters:

��Benefit Policies. Most of the State Projects offer benefits planning and assistance programs, which provide in-depth analysis, planning, and assistance related to the effect of employment on an individual’s cash benefits, health care coverage, and eligibility for other government support programs. This is intended to help beneficiaries understand and take advantage of work incentives and options that the current program offers. Models of benefits planning and assistance vary widely across State Projects. Key components of the benefits planning and assistance models currently under way include information and referral, problem solving, benefits assistance, benefits planning, and long-term benefits management. Most of the State Projects also tested new policies or services designed to make employment more attractive to beneficiaries. There were two general types of such policies: Medicaid Buy-Ins and SSI Waivers. The Buy-Ins enable disabled beneficiaries who return to work to purchase Medicaid coverage. The waivers, being offered in four states, change current SSI regulations that might discourage beneficiaries from seeking work. These include waivers that will let working beneficiaries keep more of their benefits, allow them to accumulate savings without being subject to the current asset limits, and protect them from having a continuing disability review triggered solely because of their participation in a State Project.

��Service System Barriers. Most projects tried to improve coordination among state agencies and the various organizations that provide employment supports. A common approach was to include disability-related services in DOL’s One-Stop Centers. Another was to promote Service Coordination/Integration through broad initiatives that foster better interaction among state agencies that share responsibility for encouraging work among people with disabilities. These efforts are intended to improve the environment for both participants and non-participants.

��Human Capital and Personal Barriers. Many of the projects sought to help beneficiaries obtain skills that would help them compete in the labor market. These include Employment Assistance programs that provide participants with job placement and support services to help them find and maintain employment. Services were also provided through vouchers that enable beneficiaries to obtain vocational services from a vendor of their choice. Several projects also tested ways to use Peer Support to help beneficiaries deal with the world of work.

I. The State Partnership Initiative and Its Evaluation

TA

BL

E I

.1

SU

MM

AR

Y O

F IN

TE

RV

EN

TIO

NS

BE

ING

FIE

LD

ED

BY

TH

E S

TA

TE

PR

OJE

CT

S

B

enef

it P

olic

ies

Serv

ice

Sys

tem

Bar

rier

s H

uman

Cap

ital/P

erso

nal

Bar

rier

s E

mpl

oym

ent

Mar

ket B

arri

ers

B

enef

its

Cou

nsel

inga

SSI

Wai

vers

b

M

edic

aid

Buy

-Inc

O

ne-S

top

Cen

ters

d

Serv

ice

Coo

rdin

atio

n an

d In

tegr

atio

ne

E

mpl

oym

ent

Ass

ista

ncef

P

eer

Supp

ortg

E

mpl

oyer

O

utre

achh

SSA

-Fun

ded

Stat

es

Cal

ifor

nia

X

X

X

X

Illin

ois

X

X

Iow

a (S

SA

) X

X

X

X

X

Min

neso

ta

X

X

New

Ham

pshi

re

X

X

X

N

ew M

exic

o X

X

X

X

X

X

N

ew Y

ork

X

X

X

X

X

X

Nor

th C

arol

ina

X

O

hio

X

X

Okl

ahom

a X

X

V

erm

ont

X

X

X

W

isco

nsin

X

X

X

X

X

RSA

-Fun

ded

Stat

es

Ala

ska

X

X

X

X

X

A

rkan

sas

X

X

X

X

Col

orad

o X

X

X

Iow

a (R

SA

)

X

X

X

X

O

rego

n

X

X

X

X

U

tah

X

X

X

X

X

X

a Pro

gram

wil

l pro

vide

info

rmat

ion

to p

artic

ipan

ts a

bout

wor

k in

cent

ives

and

how

wor

k w

ill a

ffec

t par

ticip

ants

’ be

nefi

ts.

b Stat

es h

ave

obta

ined

SSI

wai

vers

that

will

ena

ble

bene

fici

arie

s to

kee

p m

ore

of th

eir

earn

ings

. c St

ates

hav

e pa

ssed

and

/or

impl

emen

ted

Med

icai

d B

uy-I

n pr

ovis

ions

that

wou

ld e

nabl

e pe

ople

wit

h a

disa

bilit

y to

pur

chas

e M

edic

aid

cove

rage

. S

tate

s va

ry o

n th

eir

plan

s to

im

plem

ent

the

Buy

-In,

set

the

inc

ome

and

reso

urce

lev

els,

and

set

the

lev

el o

f pr

emiu

ms

that

par

ticip

ants

are

exp

ecte

d to

co

ntri

bute

. d St

ates

are

tryi

ng to

inco

rpor

ate

new

pro

gram

s an

d en

hanc

e ac

cess

of

exis

ting

pro

gram

s fo

r pe

ople

wit

h di

sabi

litie

s in

One

-Sto

p C

ente

rs, w

hich

pro

vide

a

sing

le lo

cati

on w

here

peo

ple

can

acce

ss e

mpl

oym

ent a

nd tr

aini

ng, e

duca

tion

pro

gram

s, a

nd in

form

atio

n on

car

eer

expl

orat

ion.

e St

ate

Pro

ject

s ar

e w

orki

ng t

o m

ake

new

and

exi

stin

g se

rvic

es f

or p

eopl

e w

ith

disa

bilit

ies

mor

e ac

cess

ible

by

impr

ovin

g th

e se

rvic

e lin

ks a

nd

conn

ecti

ons

betw

een

vari

ous

assi

stan

ce p

rogr

ams.

f St

ates

are

hel

ping

pro

ject

par

ticip

ants

to

obta

in j

obs

by p

rovi

ding

job

sea

rch

and

othe

r su

ppor

ts,

or b

y of

feri

ng v

ouch

ers

wit

h w

hich

pro

gram

pa

rtic

ipan

ts c

an c

hoos

e se

rvic

e pr

ovid

ers

and

the

serv

ices

they

wan

t. g S

ome

stat

es a

re li

nkin

g pa

rtic

ipan

ts w

ith

peer

sup

port

(ot

her

peop

le w

ith

disa

bili

ties

), b

oth

insi

de a

nd o

utsi

de th

e w

orkp

lace

. h St

ate

Pro

ject

s ar

e wor

king

wit

h em

ploy

ers

to h

elp

peop

le w

ith d

isab

iliti

es f

ind

jobs

.

4

5

I. The State Partnership Initiative and Its Evaluation

��Employment Market Barriers. A few State Projects used education, outreach, and direct incentives to encourage employers to hire more disabled beneficiaries. Also, some states used other funding (outside SPI) to fund initiatives to promote employer awareness of the abilities and employment potential of people with disabilities.

As SPI progressed, the State Projects refined their interventions, often adding components or modifying others. Thus, the specific intervention offered to participants changed over time with later enrollment cohorts getting a somewhat different mix of services and opportunities than the early cohorts. Details will be available in the final reports being developed by the projects and by the national SPI Project Office. 2. Characteristics of State Project Participants

The various State Projects have been using considerably different outreach strategies to enroll volunteers. Four states used mailing lists generated from SSA administrative data to send out invitation letters to eligible beneficiaries. One state also held recruitment/intake meetings at local restaurants to encourage people who had been invited to participate to find out more about the project services and opportunities. Projects have also relied on referrals from other agencies that serve people with disabilities, including vocational rehabilitation, local SSA offices, county mental health agencies, independent living centers, and other community organizations.

A primary distinction among the projects is that the 12 SSA-funded projects targeted only

people receiving SSI or SSDI disability benefits, while the six RSA-funded projects served people with disabilities but were not required to serve exclusively Social Security beneficiaries. As a result, while 99 percent of the participants enrolled in the SSA-funded projects reported receiving SSI or SSDI benefits at the time they enrolled, only 21 percent of the participants entering RSA-funded projects received such benefits (Jethwani et al. 2002).2 Many of the people enrolled in the RSA-funded projects receive Temporary Assistance for Needy Families (TANF). All the SSA-funded projects enrolled both SSI and SSDI beneficiaries, with the exception of New York and Oklahoma, which targeted SSI beneficiaries only.3

In many regards, the people enrolled in the SSA- and RSA-funded projects are similar

(Table I.2). They are all generally of working age and are about evenly split between men and women. Many (19 percent) do not have a high school diploma or equivalent, but about 45 percent overall have some postsecondary education. One major difference between participants in the two sets of projects is the distribution of disabling conditions. The percentage of participants who reported a mental or emotional condition as the cause of their disability was almost twice as large in the SSA-funded projects as in the RSA-funded projects. This reflects, in

2These figures are as of December 2001, the most recent date for which detailed data are available. 3Oklahoma included some SSDI beneficiaries in an early phase of their project but plans to focus exclusively

on SSI beneficiaries for its full project. New York serves SSI recipients and concurrent beneficiaries.

6

I. The State Partnership Initiative and Its Evaluation

TABLE I.2

CHARACTERISTICS OF BENEFICIARIES ENROLLED IN SSA-FUNDED AND RSA-FUNDED STATE PROJECTS (THROUGH DECEMBER 2001)

Characteristic SSA-Funded

Projects RSA-Funded

Projects All Projects

Receipt of SSI/SSDI Benefits (Percent)

99.0

21.0

71.1

Disability Type (Percentage)

Mental or emotional condition 53.5 26.5 43.9 Mental retardation 8.7 5.2 7.4 Orthopedic impairment or amputation

14.1

21.9

16.9

Visual or hearing impairment 4.0 9.0 5.8 Traumatic brain injury or spinal cord injury

6.7

10.0

7.9

Other condition 10.9 22.3 15.0 Missing 2.0 5.0 3.1

Average Age (Years) 39.3 36.6 38.4 Male (Percentage) 51.2 47.0 49.7 Prior Education (Percentage)

Less than high school 18.6 19.6 18.9 High school diploma or GED 30.3 34.3 31.8 Postsecondary education 33.4 33.6 33.5 B.A. or higher degree 13.2 9.0 11.7 Missing 4.5 3.5 4.1

Employed at Enrollment into SPI (Percentage)

39.0

29.3

35.5

Number of Participants 4,130 2,300 6,430 SOURCE: State Project intake data reported in Jethwani et al. (2002).

7

I. The State Partnership Initiative and Its Evaluation

part, the specific targeting criteria of many SSA-funded projects, which sought to serve people with mental illnesses. In addition, participants in the SSA-funded projects were about 10 percentage points more likely to be employed at the time they enrolled.

Given the voluntary nature of enrollment, it is not surprising that the State Projects have

enrolled a sample of participants whose employment attachment and vocational activity appear stronger than those of the overall beneficiary population. Among the SSA-funded projects, about 39 percent report employment at intake. In contrast, only about 7 percent of disabled SSI or SSDI recipients are employed at any given time (Pickett 2001).

The Illinois State Project was unique in that it targeted services toward high school and

college students who were receiving disability benefits. Thus, the characteristics of its beneficiaries differed substantially from those in other projects (Peikes et al. 2001; Jethwani et al. 2002). In particular, average age for participants in Illinois was 19 years (compared with 39 years for all SSA-funded projects). Also, 85 percent of the Illinois participants had not yet completed high school, and more than half (53.4 percent) reported mental retardation as their primary disabling condition. Thus, the intervention and potential net outcomes for this project are likely to be substantially different from those of the other projects that targeted working-age people with disabilities.

The State Projects have enrolled more than 6,500 participants through December 2001

(Table I.3). This includes 4,284 participants in the SSA-funded projects and 2,300 in the RSA-funded ones. Projects have reported intake data for all these participants to the SPI Project Office and, in 13 of the states, have reported their Social Security numbers to the core. Enrollment is expected to continue at least through September 2002, so the total should be over 9,000 if current rates of enrollment are maintained.

The core evaluation focuses just on the 13 projects that supplied Social Security numbers

(all 12 of the SSA-funded State Projects as well as the RSA-funded project in Utah). None of the RSA-funded projects were required to conduct formal evaluations, and only Utah subsequently decided to participate in the core evaluation. Nevertheless, all RSA states plan to provide descriptions of their project activities and all except Alaska are providing participant tracking data that describe the characteristics of people served directly by their projects and their service use, benefit receipt, and employment outcomes. For Utah, the core evaluation will have SSA administrative data on only the approximately 360 people who received SSI or SSDI benefits.

B. POLICY CONTEXT

The State Projects are addressing the persistently low rates of employment among disabled SSA beneficiaries. For example, only 6.7 percent of disabled SSI recipients were employed in December 2000 (Pickett 2001). While this rate has risen slowly over the past 25 years (from 3.4 percent in 1976), it has been stable since 1987. In addition, almost two-thirds of working SSI recipients earn less than the amount SSA designates as substantial gainful activity (currently $780 a month). Furthermore, among those disabled SSI recipients who work, use of SSA’s current work incentive programs is low. For example, among working SSI recipients with

8

I. The State Partnership Initiative and Its Evaluation

TABLE I.3

STATE PROJECT ENROLLMENTS

State Project

Enrollment Through December 2001

Enrollment Through June 2002

Projected Enrollment Through September 2002

Total, All Projects 6,584 8,317 9,040 SSA-Funded Projects 4,284 5,439 5,977 California 220 232 234 Illinois 156 212 233 Iowa (SSA) 520 626 660 Minnesota 174 261 339 New Hampshire 114 153 158 New Mexico 660 766* 823 New York 500 795 942 North Carolina 293 345 373 Ohio 426 544 599 Oklahoma 45 49 51 Vermont 584 739 794 Wisconsin 592 717 771 RSA-Funded Projectsa 2,300 2,878 3,063 Arkansas 299 338 353 Colorado 228 322 371 Iowa (RSA) 129 129 129 Oregon 261 292 292 Utah 1,383 1,797 1,918 SOURCE: State Project enrollment data submitted to the SPI Project Office. The projections assume a continuation

of recent enrollment rates through the end of September 2002. Some projects will continue to enroll after that date, so the total sample available for tracking long-term outcomes will be larger than that shown in the table. An asterisk indicates that the June 2002 enrollment figure is based on preliminary numbers.

NOTE: While virtually all the people enrolled in the SSA-funded projects were participating in the SSI or SSDI

program, only about 21 percent of the people enrolled in the RSA-funded project received such benefits. aAlaska also received funding from RSA to operate a State Project but decided not to provide quantitative data to the SPI evaluation. We have therefore omitted them from this table and from the discussions of the core evaluation.

9

I. The State Partnership Initiative and Its Evaluation

disabilities, only 31 percent use the work incentives available under Section 1619, and only 4 percent use a work incentive such as a Plan for Achieving Self-Sufficiency (PASS) to shelter some of their income (Pickett 2001).4

The low employment rates among disabled SSI recipients have proven difficult to increase

substantially. SSA has already completed two multisite demonstrations to test employment support programs. Participation was low in both demonstrations: only about 6 percent of eligible beneficiaries had enrolled in the programs (Kornfeld et al. 1999; and Prero and Thornton 1991). In addition, although the demonstrations did increase participant earnings, the increases were small in absolute terms. The Transitional Employment Training Demonstration (TETD), which served SSI recipients with mental retardation, increased earnings by 72 percent over the six years following enrollment, but the absolute increase in earnings was only $600 to $800 a year (Decker and Thornton 1995). More recently, Project Network, which enrolled a wide cross-section of SSI and SSDI beneficiaries, managed to increase earnings by 11 percent, or $220 per year, over the two years following enrollment (Kornfeld et al. 1999). These small absolute increases have translated into negligible reductions in benefit payment. This is mainly because the current regulations disregard a substantial amount of earnings when computing benefits, so that the small absolute changes in earnings translate into even smaller effects upon benefit receipt.

Low employment rates among beneficiaries are not necessarily surprising, since all SSA’s

disabled beneficiaries presumably have a permanent and total disability that precludes substantial gainful activity. Nevertheless, as new training techniques have demonstrated the feasibility of helping even people with severe disabilities obtain and hold jobs, there is a corresponding interest in promoting employment among SSA’s disabled beneficiaries. This interest has led to a consensus that no person with a disability should be denied the right to participate fully in society, including work, because of external barriers that reasonably can be removed.

For SSA, efforts to promote beneficiary employment and self-support date back to before

the 1980 amendments to the Social Security Act, which added several work incentives to the SSI program. More recently, the Ticket to Work and Work Incentive Improvement Act of 1999 (Public Law 106-170) has created a number of important new initiatives that will affect people who receive disability benefits. In addition, several important recent executive initiatives (the New Freedom Initiative and the President’s Task Force on Employment of Adults with Disabilities) have sought to identify and eliminate barriers to employment for people with disabilities. The Ticket Program legislation and the executive initiatives have produced numerous other employment support initiatives in addition to SPI, including efforts by the Centers for Medicare & Medicaid Services (CMS) and DOL.

4A PASS is an SSA-approved plan that specifies an employment goal and the expenditures required to pursue it. Under a PASS, earned income is disregarded from the benefit computation process when it is used for a purpose specified in the plan. The goal of this work incentive is to help beneficiaries obtain the means to increase their employment.

10

I. The State Partnership Initiative and Its Evaluation

The implementation of the various demonstrations and initiatives has substantially affected the SPI demonstration and its evaluation. The influx of additional resources has enabled some State Projects to offer their SPI participants enhanced services or to offer more beneficiaries services similar to those provided in their State Project. In addition, the new demonstrations and initiatives affect the environment against which the State Projects will be compared. As these new initiatives extend nationally and begin to promote the viability of work for all beneficiaries, the effect of services provided by State Projects will become harder to detect and interpret.

Since the start of the SPI, nine other major initiatives have begun to provide services or

change policies designed to promote employment among people with disabilities, including those who are receiving benefits from SSA. The following list provides an overview of these policy initiatives:

��Benefits Planning, Assistance and Outreach. This SSA grant funds benefits planning for disabled beneficiaries who are attempting a return to work. Benefit planners will provide direct advice and assistance to SSI and SSDI beneficiaries (1) by explaining SSA work incentives and the effects of work on benefits, and (2) by providing information on Vocational Rehabilitation (VR) system and other available supports.

��Medicaid Buy-In. Recent legislation enables states to modify their Medicaid programs to provide disabled workers with better access to health insurance. These new programs, which are now available in 25 states, give beneficiaries a way to obtain affordable health insurance that will cover treatment for their conditions. The Buy-In programs expand coverage by expanding Medicaid income and resource eligibility standards and by creating sliding-scale premium arrangements to encourage people with disabilities to maintain employment.

��Medicaid Infrastructure Grant. This CMS grant program provides funding to states that want to modify their Medicaid programs to implement a Buy-In program or provide other employment incentives for people with disabilities.

��Demonstration to Maintain Independence and Employment. This CMS-funded program supports efforts in three states (Mississippi, Rhode Island, and Texas) and the District of Columbia to enable people with chronic, disabling conditions to get medical benefits without having to first qualify for disability benefits (which typically requires that they leave their jobs). The demonstration will allow states to provide health care services and supports to working people who need to manage the progression of their diseases. It is expected that this demonstration will be expanded to additional states.

��Work Incentive Grants. The Work Incentive Grant Program is funded by DOL to enhance employment opportunities for people with disabilities. The grants encourage One-Stop Career Centers to develop innovative ways to ensure that this population can obtain comprehensive and accessible employment services that will address their barriers to employment. DOL expects to make grants to all states in the next few years.

11

I. The State Partnership Initiative and Its Evaluation

��Ticket to Work. This SSA program introduced a new performance-based method of paying for services that will help disabled beneficiaries find and hold jobs, while exercising more consumer choice. SSA will issue eligible beneficiaries a ticket that they can take to a service provider of their choice. Providers can choose whether to accept a beneficiary’s ticket. If they do accept and try to help the beneficiary obtain employment, their payments will be based on achieving specific milestones, including a beneficiary’s successful transition from the disability rolls to self-supporting employment. The Ticket Program was introduced in 13 states during 2001. It is scheduled to be operational in all states by January 2004.

��Olmstead Grants. This CMS grant program helps states place into an integrated setting those qualified persons with disabilities who are currently in institutions or being assessed for institutionalization. This initiative includes three categories of systems grants to states: (1) Nursing Facility Transition Grants, (2) Community-Integrated Personal Service and Support Grants, and (3) “Real Choice” System Change Grants.

��Indexing of the SGA Amount. Since the start of SPI, SSA has begun to adjust the average monthly earnings amount used to determine whether work done by disabled beneficiaries is substantial gainful activity (SGA). The annual adjustments are intended to correct for inflation. Before this, the Social Security Commissioner prescribed regulations regarding the appropriate level to set the SGA.

��Employment Assistance Grants Through DOL’s Office of Disability Employment Policy. This grant program targets planning and implementation activities to enhance the availability and provision of employment services for people with disabilities within the One-Stop delivery system. Technical assistance grants also are being offered to One-Stop Career Centers, State and Local Workforce Investment Boards, Youth Councils, and Workforce Investment Act Grant Recipients who serve adults and youths in order to improve employment outcomes for people with disabilities.

These various initiatives create a dynamic environment that complicates the SPI evaluation. While employment and training evaluations have long faced the challenge of accounting for local variation in service environments, these new initiatives are being introduced during the SPI effort and, in many cases, offer service interventions that also are offered by the State Projects. They therefore change the mix of services available to participants and potential comparison group members at the same time that the SPI State Projects are trying to deliver new services to participants. As a result, the evaluation must assess local service environments for both static and dynamic differences that might affect employment of the disabled SSI and SSDI beneficiaries. If the new initiatives successfully expand the availability of employment-support services, the net extent to which the State Projects can expand services to participants will be reduced, which would in turn reduce the potential net outcomes produced by the State Project, relative to what was expected when those projects were designed.

Ideally, the evaluation would interview SPI participants and comparison group members to

measure precisely the services each group used. The evaluation could thus obtain an accurate measure of the net intervention delivered through the State Projects. Such a survey is not planned, however. Instead, the evaluation will rely on the process analyses conducted by the

12

I. The State Partnership Initiative and Its Evaluation

states and the SPI Project Office to assess the general availability of services in the demonstration and comparison areas. This analysis will try to collect information about the extent of the non-SPI services, using such measures as the number of participants those programs serve, the average level of expenditures per participant, and the presence of any waiting lists. We return to this issue in Chapter V, where we discuss the supplemental SPI evaluation component.

C. THE FOUR-PART EVALUATION STRATEGY

As noted at the beginning of this chapter, SSA conceived a four-part strategy to evaluate the SPI State Projects. First, the core evaluation uses a common methodology and SSA administrative data to estimate the net outcomes produced by each State Project. Second, the supplemental evaluation draws on state data supplied by the projects to extend and enhance the core evaluation. Third, each State Project is conducting its own evaluation of net outcomes. Fourth, the Project Office is working with the projects to document how the project interventions were implemented. This last component will also provide the basis of synthesizing results from all the components in order to compare results from the projects and draw policy conclusions from the initiative.

An important aspect of this four-part strategy is the overlap of components. In particular,

the State Projects’ evaluations and the net outcomes evaluation will use different data sources and methods to estimate the effects of the interventions. The net outcomes evaluation offers the advantage of a consistent method and data source for evaluating all State Projects. Thus, differences in the estimated effects for the State Projects can be attributed largely to the projects themselves rather than to underlying methodological differences. The State Projects’ evaluations will use a mixture of methods and data, but often will have access to data that are more detailed than the SSA records to be used in the core net outcomes evaluation. Eventually, the estimates generated by all the evaluation components will be used to develop a comprehensive understanding of the State Projects and their effects.

Together, the four evaluation components will address seven key research questions

(Table I.4): 1. Feasibility. Whether states can operate employment-promotion interventions on a

policy-relevant scale

2. Participation Rates. The extent to which beneficiaries volunteer to participate in the State Projects

3. Intervention Delivery. The nature and magnitude of the services received by participants

4. Net Outcomes. The effect of the State Projects on key outcomes, including employment, benefit receipt, and total income

13

I. The State Partnership Initiative and Its Evaluation

TABLE I.4

ISSUES ADDRESSED IN THE STATE PARTNERSHIP INITIATIVE EVALUATION

Evaluation Component Addressing the Issue Net Outcomes Evaluation

Program or Policy Issue

State Project Evaluations

Project Office Implementation and Participation

Analysis

Core

Supplemental Feasibility: Can states operate employment-promotion programs for disabled beneficiaries? Do those programs operate on a policy-relevant scale?

� �

Participation Rates: How many eligible beneficiaries choose to enroll in a State Project? How do the characteristics of participating beneficiaries differ from those of nonparticipating beneficiaries?

� �

Intervention Delivery: What services do participating beneficiaries receive? Do they receive more services than they do in the absence of the State Project?

� �

Net-Outcomes: Do State Projects: Increase employment and earnings of participating beneficiaries? Reduce targeted federal benefits (SSI, SSDI, and TANF)? Increase the income of participating beneficiaries? Affect participant tax payments and health insurance costs?

� � �

Costs and Benefits: What are the net costs of the State Projects: To the SSI, SSDI, and TANF programs? To other programs (housing supports, food stamps, General Assistance, Medicare, Medicaid, Workers’ Compensation, state vocational rehabilitation programs, and private disability insurance)? To participating beneficiaries?

Subgroup Effects: Do the effects differ among participants with different characteristics and preenrollment experiences (for example, groups defined by prior employment, receipt of SSDI or SSI benefits, or demographics)? Do the effects differ across the State Projects?

� � �

Demonstration Context: How do the characteristics of the State Projects and the state economies influence the effectiveness of the State Projects?

� � � �

14

I. The State Partnership Initiative and Its Evaluation

5. Costs and Benefits. The net cost of the interventions to the federal government, state governments, and participants

6. Subgroup Effects. Whether there are specific types of participants or State Projects that generate above-average effects

7. Demonstration Context. The nature of the policy and economic environments in which the State Projects operate and the extent to which those environments influence State Project effectiveness

1. The Core Net Outcomes Evaluation

The core net outcomes evaluation will develop state-specific estimates of the net change in the following participant outcomes: receipt of SSI or SSDI benefits, amount of SSI or SSDI benefits received, employment status, earnings, and total income. State-specific estimates were desired because SSA wanted to identify best practices by comparing the results of the various State Projects. Furthermore, given the differences in the SSI and SSDI programs, particularly with respect to work incentives and the work histories of the beneficiary populations, the evaluation will generate separate estimates for SSI and SSDI beneficiaries (concurrent beneficiaries are included with the SSI beneficiaries).

In designing the core evaluation, we looked for an approach that had the following

characteristics:

��Uniform Methodology That Could Be Applied Consistently to All the State Projects. In this way, differences in state-specific net outcome estimates could be compared to obtain a measure of relative program performance. Estimates would not differ because of underlying differences in data or estimation method. In contrast, individual State Project evaluations may have access to richer data, but differences among the states in data quality, definitions, and coverage would make it extremely difficult to separate real differences in project effects from differences in the underlying data and analysis methods.

��Exclusive Use of SSA Administrative Data. The core design uses only SSA administrative data. These data are consistently measured among states, including those that did not participate in SPI and are being used as a source of comparison beneficiaries.

��Potential for Automation. SSA hoped to have a design that would require relatively few resources to implement. The expectation was that the data sets would be updated on a regular basis and that net outcomes could be tracked over time using a highly automated process that did not require substantial researcher input.

��Non-experimental Method. While SSA has long recognized the merits of using experiments, SPI did not require that states use such methods. As a result, only 4 of the 12 SSA-funded State Projects used random assignment, and none of the RSA-funded

15

I. The State Partnership Initiative and Its Evaluation

projects did so. To have a consistent design for all states, the core evaluation therefore focused on comparison-group methods rather than on experiments.

In addition to meeting these goals, the evaluation design had to deal with modest enrollment

levels in the State Projects. While overall enrollment in SPI is substantial, the number of participants in many of the projects will provide enough statistical power to detect only very large net changes in participant outcomes.

2. The Supplemental Evaluation Component

The supplemental component was intended to enhance and extend the core component by using data provided by the State Projects. These data can provide a basis for investigating how the State Projects affect participation in other government programs, as well as for determining the net costs of the intervention. They also could provide detailed information that will enable the evaluation to examine the time profile of employment and earnings effects.

The nature of the data provided by the State Projects remains problematic. So far, the focus

of the Project Office and State Projects has been on collecting accurate participant tracking data. There has been little attempt to provide state administrative files that could be used to supplement the core net outcomes evaluation. In addition, many of the projects have scheduled their efforts to construct data files to culminate in the fifth year of their operations, when they will be conducting their internal evaluations. Thus, those data are not currently available for testing alternative supplemental evaluation designs.

All SSA-funded states have agreed to provide their final evaluation data sets to SSA. This

will enable SSA to conduct several supplemental analyses. If the Project Office’s process analysis identifies projects that have similar interventions, evaluation designs, and data sources, the supplemental analysis could pool project data from several states to increase the statistical power. It may also be possible to use quarterly earnings reports from state Unemployment Insurance records to supplement the core analysis of employment and earnings. This will be particularly useful for the SSDI beneficiaries for whom SSA has only annual earnings data.

3. State Projects’ Evaluations

All the SSA-funded State Projects have in place a net outcome evaluation design based on at least one of three approaches: (1) random assignment, (2) comparison group, and (3) pre-post.

a. Random Assignment Currently, four states—Illinois, New Hampshire, New York, and Oklahoma—are using

random assignment, the most rigorous design approach being used among the State Projects. Of these four, the New York evaluation has been the most successful at obtaining an adequate sample size. By the end of 2001, this Project had randomly assigned more than 2,700 beneficiaries to one of two treatment groups or to a control group. This sample size and the

16

I. The State Partnership Initiative and Its Evaluation

subsequent participation rate of beneficiaries who are assigned to the treatment groups appear to be high enough to give New York a good chance of detecting net changes in outcomes (we return to this issue in Chapter IV). Enrollment in the other three random assignment states has been fairly low, so they are unlikely to be able to detect the effects their interventions are likely to produce.

b. Comparison Group Design

Fourteen State Projects have opted to use some type of comparison group design. Seven are relying solely on comparison group designs to evaluate their interventions, six are using comparison group and pre-post designs to evaluate their interventions, and one (Oklahoma) is using a comparison group design to supplement its random assignment design.

These designs compare outcomes of participants with outcomes for a comparison group

consisting of beneficiaries who are matched with participants. The strength of the evidence these evaluations produce is crucially dependent on how well the State Projects are able to identify comparison beneficiaries who, except for participation in the State Project, resemble participants along all characteristics, both measurable and unmeasurable. Also, the strength of the evidence will depend on the quality of State Project data and the number of participants that enrolled.

The comparison groups proposed by the State Projects fall into one of three categories:

1. Within-state comparison group designs that compare outcomes of participants with outcomes of matched beneficiaries in nondemonstration sites within the state

2. Out-of-state comparison group designs that compare outcomes of participants in the demonstration area with outcomes of beneficiaries who met the same target criteria in another state that is not participating in SPI

3. Eligible nonparticipant designs that compare outcomes of State Project participants to those of beneficiaries who were eligible to participate but did not

The within-state comparison group designs have the best chance of finding comparison

groups that individually are well matched to the participants and are subject to the same policy and economic environment as participants. The out-of-state comparison group designs are likely to have greater difficulty matching environments. Out-of-state designs face the additional problem of inconsistently measured data when the State Projects try to combine their data on participants with data other states provide about comparison group members.

The eligible nonparticipant design draws its comparison group from people in the same

policy and economic environment. Nevertheless, this design may produce unreliable evidence of program effects, because eligible nonparticipants may differ fundamentally from those beneficiaries who enroll voluntarily. Differences in measurable characteristics can be taken into account, but unmeasured characteristics such as motivation may still bias results.

17

I. The State Partnership Initiative and Its Evaluation

c. Pre-Post Design

Six states are utilizing some type of pre-post design. These designs rely on people’s experiences before enrollment to provide a comparison for evaluating changes after enrollment. This method assumes that past experiences indicate participants’ future activities in the absence of State Project services.

This method was largely discredited for evaluating employment and training programs. In

essence, a pre-post analysis assumes that people would not change their preenrollment employment and earnings in the absence of the program. This assumption is essentially a bet against the success of people with enough motivation to volunteer for the employment support program. However, experience indicates that people who enroll in training programs are seeking a change and will try to change even if they do not get into the program. Thus, their preenrollment employment levels are a poor indicator of their long-term employment patterns. This is clearly illustrated by the results from the TETD (Decker and Thornton 1995), which was evaluated using random assignment and seven years of SSA’s administrative data. The evaluation found that the control group, which was not offered demonstration services, nevertheless managed to double its average earnings during the six-year follow-up period. In this case, a pre-post comparison using only the treatment group would have overestimated program effects by more than 100 percent, compared with the random assignment results. Also, the evidence indicates that the average earnings of the treatment and control groups were essentially constant during the year prior to enrollment in the study. Thus, efforts to control for earnings in the pre-period period would not have improved the pre-post estimates substantially. All this means that people’s past experience serves as an inadequate “control” for judging future performance, and that pre-post evaluations are unlikely to provide strong evaluation evidence.

4. Implementation and Synthesis Analyses

In order to interpret the various estimates of net outcomes, the SPI Project Office will work with the projects to document their implementation. This implementation analysis is expected to provide general information about the following operational issues:

��Program size (number of participants and average caseload)

��Types and intensity of services provided

��Description and analysis of service implementation

��Characteristics of participants

��Characteristics of the State Project participant enrollment processes, including a quantitative analysis of the differences between participants and eligible nonparticipants

��Length of participation

��Job placement rates

18

I. The State Partnership Initiative and Its Evaluation

��Major employment or policy shifts during State Project implementation

��Costs for major, direct service components The process analysis will highlight three interventions that are expected to be particularly

important for SSA’s policy development: (1) benefits counseling, (2) use of DOL One-Stop Centers, and (3) Medicaid Buy-In programs. Not only are many states using these interventions as part of their State Project, but major policy initiatives have been implemented that will build on these interventions. Thus, it is especially important that the experience of SPI be used to inform those developing initiatives.

The Project Office evaluation will use qualitative and quantitative data. The qualitative

information will be collected during visits to the State Projects and from reports they submit. This information will be used to document and assess project operations and to help develop suggestions for future employment support efforts. The quantitative data (which the State Projects are collecting directly from participants and from state administrative data systems) will pertain to the characteristics of participants and to the nature, duration, and costs of the services participants receive from State Projects.

It has taken a substantial effort to collect quantitative data on participants, and complete data

are not yet available. Data about participant characteristics at the time they enrolled in a State Project are available and generally appear to be internally consistent and complete. Data about services provided to participants, benefit receipt following enrollment, and employment are still being collected and cleaned. Many of the problems with the postenrollment data seem to be related to the challenge of creating a new data collection and participant tracking system. Also, projects collected and reported data in many different ways, which has complicated the process of creating uniform demonstration research files. The projects and the Project Office expect to have resolved the problems and to have complete data files available in 2003.

In addition to the implementation analysis, the Project Office will also begin the process of

synthesizing the results from the four evaluation components in order to draw overall conclusions about effective strategies for promoting employment and financial independence among disabled beneficiaries. Given the diverse methods and data used in the evaluations and the variety of interventions being fielded, the synthesis will have to rely on qualitative judgments rather than on quantitative meta-analytic methods.

II

SELECTION OF COMPARISON AREAS FOR THE CORE EVALUATION

he first step toward designing the core evaluation was to identify a set of comparison counties similar to the counties in which the State Project enrolled beneficiaries. In particular, we wanted to identify comparison counties that offered the same employment

opportunities and incentives for disabled beneficiaries that would have been available to project participants in the absence of SPI. The selection process started with an effort to match State Project counties and comparison counties in terms of 13 county characteristics that are likely to influence the employment of disabled beneficiaries. Once this process identified an initial set of comparison counties, we asked State Project staff whether any unmeasured differences between our initial selections and the State Project counties would create substantially different employment environments. We were particularly interested in whether there were local employment initiatives or substantially different employment support services in the selected comparison counties. From our initial list, we deleted counties that were felt to have substantially different service environments. This left a set of comparison counties well matched to the demonstration counties.

The goal of this process was to control for those environmental factors that influence the

employment decisions of SSI and SSDI beneficiaries. As discussed in the next chapter, the design uses an individual matching process and several analysis techniques to control for a wide range of individual characteristics when selecting beneficiaries within the selected comparison areas. In picking comparison areas, we therefore wanted to control for any additional effect that area characteristics might have in shaping beneficiaries’ employment decisions. We focused on employment decisions, as the primary goal of the State Projects was to promote employment among disabled beneficiaries. Furthermore, beneficiaries’ decisions about working and the hours they work underlie the other primary evaluation outcomes: whether or not beneficiaries continue to receive benefits, the level of any future benefits, and their total income.

The comparison area selection process used the following steps:

��Identify the areas in which the State Projects offered services (the demonstration areas). Generally, services were offered either in a few counties or statewide.

��Identify the broad areas from which the comparison counties will be selected. For projects that offered services in only a few counties or cities, we typically selected comparison counties from the balance of the state. For projects with statewide services, we selected comparison counties from nearby states.

��Make an initial selection of comparison counties by identifying those comparison-area counties that most resemble the demonstration counties in terms of 13 county-level

T

20

II. Selection of Comparison Areas

characteristics that affect labor markets. These include population characteristics (such as poverty rate and racial mix), physical characteristics (such as population density and availability of public transportation), and economic characteristics (such as the unemployment rate, the roles of farming and manufacturing in the local economy, and the county-specific employment rate of SSI beneficiaries before the start of a state’s project).

��Have State Project staff review the preliminary list and delete counties whose service environments are substantially different from those in the demonstration counties. This step also helps to ensure the face validity of the selected comparison areas to policymakers within each demonstration state.

Overall, the basic elements of this approach appear to have worked. We were able to select, for all demonstration areas, comparison counties that resemble those in which the State Projects operate. While reviews by State Project staff indicated that several selected counties should be dropped because their service environments differed substantially from that in the demonstration counties, we identified acceptable comparison counties for every State Project in the evaluation.

Further evidence of the success of the matching process is the fact that the selected

comparison counties contain enough beneficiaries for the individual-level matching to have a sufficiently large pool from which to select a comparison group of beneficiaries.

The specific counties selected as comparison areas change somewhat depending on the

weights given to each of the area characteristics in the matching process. Nevertheless, there is a high degree of similarity among counties selected with alternative weights. For example, a process that gives all characteristics an equal weight selects about 70 to 90 percent of the counties selected with the recommended method (which gives each characteristic a weight based on its estimated importance in predicting SSI beneficiaries’ employment). Furthermore, there is a similarly high degree of stability when considering just the set of counties that were selected after review by the State Project staff.

In addition, the available data suggest that precise matching along each area characteristic is

not particularly important for analyzing the employment of SSI beneficiaries (the group for which we have the most information). Regression analysis indicates that area characteristics generally have a very small effect on the probability that an SSI beneficiary will be employed. While several area characteristics are significantly related to the probability of employment, their marginal effects are very small. For example, the predemonstration employment rate among SSI beneficiaries is the area characteristic most consistently found to be correlated significantly with employment. Yet the estimated marginal effect suggests that a standard-deviation-sized increase in this rate would increase the probability of employment from 0.2 to 0.8 percentage points (depending on the state).

In contrast, a person’s recent employment history is highly predictive of whether an SSI

beneficiary will be employed. Being employed any time during a three-month period increases the probability that an SSI beneficiary will be employed in the subsequent month by about 85 percent. The predictive power of prior employment status reflects the tremendous stability of employment status among SSI beneficiaries. About 87 percent of beneficiaries remain

21

II. Selection of Comparison Areas

consistently not employed during a four-month period, with another 11 percent being employed consistently. Less than 2 percent obtain or lose employment during such a time.

Therefore, we feel that the final net outcome estimates, which will control for individual

characteristics, are likely to be insensitive to the selection of specific comparison counties as long as the selected counties are similar to those in which demonstration services were offered. Furthermore, we believe that this result holds for SSDI as well as SSI beneficiaries. The two groups differ in work histories and program incentives, but individual characteristics will probably play the dominant role in shaping the work efforts of both groups, with county characteristics playing a decidedly secondary role. Nevertheless, so that the final evaluation can test this assumption, the final database will contain information about the alternative set of comparison counties selected, using equal weights for each area characteristic.

Sections A through D describe the steps used to select the initial set of comparison counties

that were given to the states for review. Section E summarizes the State Project review of the initial selections. Section F reviews the sensitivity of the final selection of comparison counties to alternative approaches. A full list of the demonstration and selected comparison counties is provided in Khan et al. (2002).

A. DEFINITION OF POTENTIAL COMPARISON AREAS

State Projects were implemented in two types of geographic areas: (1) specific cities or counties within a state, or (2) an entire state. For demonstration areas that were specific cities or counties, we generally searched among all nondemonstration counties within the same state to identify those most similar to each of the demonstration counties (Table II.1). For example, to find comparison counties that resembled Kern County, California, we searched among all other California counties (except for the other California demonstration counties) and selected those with employment and service environments similar to that of Kern County. In this process, a comparison county could be matched with more than one demonstration county.

For demonstration areas that encompassed an entire state, we searched for comparison

counties in nearby states. In addition, we focused on those states that are served by the same SSA regional office as the demonstration state. Thus, we hoped to control for any regional differences in the way SSA procedures are implemented or SSA administrative data coded. For example, the Wisconsin project served the entire state. We therefore searched for comparison counties in Michigan and Illinois that are also served by SSA’s Chicago regional office.

The one exception was in New Mexico, which implemented its project in a set of counties

that essentially included all the highly populated areas of the state. Thus, there were no adequate comparison counties left within New Mexico. We therefore used the procedure for statewide projects and looked for comparison counties in neighboring states.

22

II. Selection of Comparison Areas

TABLE II.1

STATES FROM WHICH COMPARISON COUNTIES WERE SELECTED

Demonstration States States from which Comparison Counties Were Selected

California California Illinois Illinois Iowa (SSA) Iowa Minnesota Illinois, Michigan, Iowa, North Dakota, South Dakota New Hampshire New Hampshire New Mexico Arizona, Nevada, Oklahomaa New York New York North Carolina North Carolina Ohio Ohio Oklahoma Oklahoma Utah Arizona, Idaho, Nevada, Wyoming Vermont Maine, Massachusetts, New Yorka Wisconsin Illinois, Michigan

aOnly nondemonstration counties from this state will be used as comparison counties.

23

II. Selection of Comparison Areas

This process differs slightly from the one proposed in the project’s initial design report (Thornton et al. 2000). We thought originally that it might be possible to select comparison areas for some of the State Projects that operated in a specific city or county from the balance of the labor market area (LMA) that contained the demonstration area.1 This approach was appealing, because the U.S. Department of Labor (DOL) has already determined that people residing within a labor market area face the same general employment opportunities and conditions. However, we decided against using this process as our basic approach, for two reasons. First, it is still unclear whether State Projects are strict about enrolling only people from the specific portion of a city or county. Early in the initiative, State Projects gave the evaluation a list of the ZIP code areas in which they intended to enroll participants. Subsequently, it appears that the State Projects are not being particularly strict about limiting enrollment to those ZIP code areas. In at least two instances (Iowa and New Mexico), the State Projects have formally expanded their demonstration areas from two specific areas to larger, multicounty areas that include virtually all the relevant labor market areas. Thus, we felt that it was not possible to be sure that beneficiaries residing in the balance of a labor market area did not, in fact, have access to State Project services. Second, we were concerned that many projects focused on the urban core of a labor market area, which would have left the suburban ring as the comparison area. Since many people with disabilities have difficulty traveling, and since employment in the suburbs generally requires independent travel by automobile, we felt that the suburbs were not a good comparison areas for urban programs.

There are only three exceptions to our general approach. Two arose in Illinois and New

York, where we had difficulty finding matches for Chicago and New York City. Those major cities so differed from all other counties in their respective states that the general approach appeared inappropriate. We instead used a modified version that focused on working with State Project staff to select comparison areas from within the cities’ broader labor market areas. The other exception arose in New Hampshire. That state has only 10 counties, so it was impossible to use the general approach, which tries to match counties on 13 characteristics. Again, we implemented a modified approach that focused on the balance of the labor market areas that contained the demonstration areas.

Another change from the initial design report was to select comparison areas only for the 12

SSA-funded State Projects and the Utah project funded by the Rehabilitation Services Administration (RSA). These 13 projects provided the core with the Social Security numbers of participants, which are essential for implementing the core. The remaining five RSA-funded State Projects—Alaska, Arkansas, Colorado, Iowa (RSA), and Oregon—decided not to provide such identifying information and therefore cannot be included in the core, as it will be impossible

1An LMA is “an economically integrated geographic area within which individuals can reside and find employment within a reasonable distance or can readily change employment without changing their place of residence” (U.S. Department of Labor 2000a). In most cases, an LMA is a single county or a group of counties. In a few cases, an LMA is a single town or group of towns. Defining a comparison area as the balance of the LMA that contains the demonstration area is appealing, because it ensures that the demonstration and comparison areas have similar labor markets. However, there is still the issue whether the balance of an LMA will have the same transportation and policy environment. In particular, we are concerned that the balance of an LMA surrounding a large city may represent a different type of environment, even if it is contained in the same LMA.

24

II. Selection of Comparison Areas

to distinguish participants from nonparticipants in the SSA administrative files used for the evaluation. None of the RSA-funded states had planned to participate in the core at the time they applied for demonstration funding, and only Utah has modified its plans to provide SSNs to the core evaluation.

B. METHODS AND DATA USED TO SELECT COMPARISON AREAS

The initial selection of comparison counties was done with a nonparametric matching algorithm (Mardia et al. 1979) that identified 5 to 10 potential comparison counties for each county in which a State Project offered services. This method assigned to each potential comparison county a score based on the weighted sum of its similarity rankings for 13 county-level characteristics that affect employment among SSI beneficiaries. The scale is defined so that the potential comparison areas with the lowest scores are those most similar to the demonstration area.

Although numerous measures of similarity exist between geographic units, there are three

advantages to a nonparametric matching algorithm over other approaches, such as those that rely on a minimum distance algorithm. First, it addresses the problem of combining nominal, ordinal, and continuous variables. Second, it is robust to extreme values of key area characteristics. Third, it provides considerable flexibility for testing different combinations of either the characteristics or the weights attributed to them.

The selection process used counties as the unit of analysis in the matching algorithm. We

felt counties to be appropriate because they are likely to represent a relatively homogeneous labor market and they tend to have enough resident beneficiaries to produce valid statistics for the area (smaller units, such as ZIP codes and census tracts, often have fewer than 10 beneficiaries, particularly in rural areas). Furthermore, substantial statistical information is available at the county level, so it is possible to consider more characteristics than would have been possible if smaller areas had been used.

In implementing this approach, it turned out that several of the 13 area characteristics used

in the matching were highly correlated. We decided to drop one of any pair of variables that had a correlation coefficient of 0.70 or more. This avoided giving extra weight to a characteristic by including two variables that essentially measure the same thing.

To implement the matching process, we then used the following steps:

1. For each of the variables, calculate the absolute value of the difference between the variable’s value for each potential comparison county and the corresponding value for the demonstration county.

2. Sort the differences calculated in the first step from smallest to largest and compute the rank of the absolute difference between each potential comparison county and the demonstration county. This rank indicates the relative similarity between each potential comparison county and the demonstration county for a given area characteristic. For example, a rank of 1 for the county unemployment rate indicates

25

II. Selection of Comparison Areas

that a potential comparison county has the smallest difference between its unemployment rate and the rate observed for the demonstration county.

3. Compute the weighted sum of the ranks of the variables for each potential comparison county. The weights reflect the relative importance of the included variables in explaining employment among SSI beneficiaries. The methods used to estimate these weights are described below.

4. Rank the counties by the weighted sum of the individual-variable ranks, and use the counties with the lowest weighted ranks as the possible comparison area for the associated demonstration county. In general, we selected the five counties that best matched as our initial list of comparison areas to be reviewed by State Project staff. In this process, some comparison counties were matched to more than one demonstration area.

5. This process is repeated for each demonstration county.

In matching counties, we focused on 13 measures that reflect 10 general characteristics of

the labor environment faced by SSA beneficiaries. In general, we measured characteristics in June 1999 (or for the 12 months ending in June 1999), approximately the first month of State Project enrollment. When a measure was unavailable for June 1999, we used data from the most recent preceding year. While some State Projects (notably California) enrolled a few participants earlier in 1999, it is extremely unlikely that the projects could have had an effect on county-level characteristics before June. Thus, we matched counties in terms of their predemonstration characteristics. The characteristics of the demonstration and selected comparison counties can be tracked over time. If substantial differences emerge, the analysis could control for the differences in the regressions used to obtain impact estimates. The 13 measures, grouped into the 10 general characteristics, are described in the following sections. 1. Population Density

This measure is the number of people living in the county divided by the county’s area measured in square miles. It provides an index of the extent to which the county is urban, suburban, or rural in nature and the likely density of employment and services. We obtained population density measures from the U.S. Bureau of the Census (2000c), whose figures are based on 1990 data.

2. Population Growth

The growth rate in county population is used to reflect the relative growth in economic opportunities among counties. Counties with rapidly increasing populations are expected to offer a more dynamic economic environment than those with low or no population growth. This variable is calculated as the percentage change in a county’s population from April 1, 1990, to July 1, 1999. The source for this information is the Populations Estimates Program in the Populations Division of the U.S. Bureau of the Census (2000b).

26

II. Selection of Comparison Areas

3. Unemployment Rate

We used two measures to capture each county’s unemployment situation, which was expected to affect the employment opportunities available to disabled beneficiaries.2 Data for these measures came from monthly county-level unemployment rates reported by DOL, Bureau of Labor Statistics (2000b).3 The two measures are:

��Unemployment Rate. The average unemployment rate during the 12 months from July 1998 through June 1999 captured the relative economic condition of the counties at the general time the State Projects started.

��Unemployment Volatility. The difference between the maximum and minimum monthly county unemployment rates observed between July 1998 and June 1999 as a percentage of the minimum rate captured the extent to which job opportunities fluctuated during the year. For example, counties with high levels of agricultural or tourism jobs are likely to have high unemployment volatility.

4. Total County Employment

We used two variables to describe the total size of the labor market facing beneficiaries in each county. These variables helped us match counties in terms of the total volume of jobs available.

��Total Employment. Total employment in the county during June 1999 is a direct measure of the total number of jobs.

��Employment Growth. The percentage growth in total county employment between July 1998 and June 1999 is used to capture the extent to which employment opportunities are increasing in a county.

These variables were measured using data from the DOL, Bureau of Labor Statistics (2001).

2We had originally included a third unemployment variable to capture the general trend in unemployment rates during the 12 months prior to June 1999. However, that variable did not seem to add much predictive power to our process, so we dropped it from the final analysis.

3DOL computes these rates using a “handbook method,” a series of computational steps formulated to produce

local employment and unemployment estimates (U.S. Department of Labor 1997). The county-level unemployment rates used in this analysis are not seasonal but are adjusted for consistency with the more accurate state estimates.

27

II. Selection of Comparison Areas

5. Percentage of County Land in Farming

This variable reflects the role of agricultural employment in a county’s labor market. It is the ratio of a county’s acreage of farmland to its square miles of total land (U.S. Department of Agriculture 2000).

6. Presence of Substantial Manufacturing

This binary variable was used to reflect the role of manufacturing in a county’s economy during March 1997. Disabled beneficiaries may face different employment opportunities in counties with different levels of manufacturing activity. It takes on a value of 1 if there is substantial manufacturing activity in the county and a value of 0 otherwise (estimates of manufacturing employment were obtained from the 1997 Economic Census [U.S. Bureau of the Census 2002b]). We classified counties as having substantial manufacturing activity when the proportion of the labor force engaged in manufacturing was more than the median value for all counties included in the comparison-area selection process for a State Project. This measure was used instead of a direct measure such as the number of manufacturing jobs, because the Census does not report jobs data in counties with few employers.

7. Public Transportation Use

This variable was used to capture the availability of public transportation, which is expected to make it easier for people with disabilities to travel to jobs and services. It is the percentage of people who were at work during a census reference week and who used public transportation to get to their job (U.S. Bureau of the Census 1995). Public transportation includes bus, trolley, subway, railroad, ferryboat, and taxicab.

8. Poverty Rate

This variable was included to measure the local socioeconomic conditions. It also indicates the relative size of the potential pool of low-income workers who often compete with people with disabilities. The poverty rate measure is based on estimates for people of all ages and their income during 1995 (U.S. Bureau of the Census 2000a).

9. Percentage of County Population in Racial/Ethnic Minorities

Two variables were used to help match counties in terms of their racial and ethnic population characteristics. These data were obtained from the U.S. Bureau of the Census (2002a).

��Percentage of County Population of Hispanic or Latino Origin

��Percentage of County Population Not White and Not of Hispanic or Latino Origin

28

II. Selection of Comparison Areas

10. SSI Beneficiary Employment Rate

This measure tries to capture the extent to which the local employment market and service environment promote employment among people who receive disability benefits. It is measured using tabulations from SSA REMICS administrative data files and reflects the employment rate of SSI beneficiaries in a county during March 1999, which is before State Projects could have affected a county’s service or employment environment.4

11. Other Area Characteristics Considered

In addition to the 13 variables we included, we considered two measures of beneficiaries’ participation in SSA disability programs: (1) the average monthly SSI benefit payment (including all state supplements), and (2) the county-level SSI and SSDI exit rates among beneficiaries prior to the demonstration. We control for benefit levels as part of the process for matching individuals; thus, we did not include these measures in the county-matching process. We excluded county-specific exit rates because we felt that they were not important for describing the employment environment that disabled beneficiaries face. Furthermore, it appears that exit rates are more likely determined at the state or SSA region levels, where decisions are made about continuing disability reviews, rather than being a county-specific phenomenon.

We also investigated several measures of service availability. These included the relative

generosity of the state Medicaid program (as measured by per capita Medicaid spending), the general availability of vocational services (measured by the number of clients in the state vocational rehabilitation (VR) system per 1,000 residents or the per capita annual VR expenditures by state or county), and the general availability of mental health services (measured by per capita annual mental health expenditures). We did not use these measures, however, for several reasons. First, we felt that relative levels of the predemonstration employment rate for SSI beneficiaries in a county reflect relative differences in the employment supports available in different counties. Second, we asked State Project staff to consider overall service system differences when they reviewed our initial selections for comparison areas, and such reviews need not be limited to a specific set of variables. Finally, these other measures often reflect phenomena that are more state- or regional-specific than county-specific, and therefore would not add much to our county-matching process.

C. ESTIMATING WEIGHTS FOR THE COUNTY CHARACTERISTICS

In computing the similarity between two counties, we gave each county characteristic a weight based on its relative importance in explaining beneficiary employment. To compute

4The REMICS files are produced each month and contain information for all SSI beneficiaries on the rolls that month. In particular, they contain information about earnings in the month, use of SSI work incentives, and benefit payments. These files are extracts from the basic administrative file used to run the SSI program, the Supplemental Security Record (SSR). We provide more information about these variables in Agodini et al. (2001).

29

II. Selection of Comparison Areas

these weights, we began by estimating a logit regression model of SSI beneficiary employment in June 1999 as a function of individual beneficiary characteristics and the county characteristics.5 We used beneficiary employment as the dependent variable in this approach, because it provides the underlying mechanism for producing all other key outcomes SPI expects to produce. In particular, the State Projects expect to influence SSI/SSDI participation and benefit levels by increasing employment and earnings. We focused on June 1999 as a time period before the State Projects could have influenced the overall employment opportunities and patterns of SSI beneficiaries. The individual-level independent variables included whether a beneficiary worked at any time during the three prior months, receipt of SSDI benefits, receipt of other unearned income, race, age, gender, and primary disabling condition. These variables were selected based on prior literature about employment among people with disabilities subject to the information availability in the REMICS files.

Ideally, this analysis would have included both SSI and SSDI beneficiaries. As it is, we

included only SSI beneficiaries and those who concurrently receive SSI and SSDI benefits, since more detailed monthly data were available for SSI beneficiaries. In particular, we wanted to have measures of unearned income and employment in the months just prior to June 1999, variables that are unavailable in SSA administrative data for SSDI beneficiaries.

While the two groups of beneficiaries are likely to have different employment behavior, the

results of the SSI-based regression models nevertheless appear useful for area matching. SSDI beneficiaries have more extensive work histories than those on SSI (because they need to have earned income in enough calendar quarters to establish SSDI eligibility) and therefore are likely to face different opportunities in the labor market. Furthermore, the work incentives differ substantially between the SSDI and SSI programs, and those differences are also likely to lead to different employment behaviors. These differences mean that the estimated effects of area and individual characteristics on employment for SSI beneficiaries are likely to be unrepresentative of the effects that would be observed for SSDI beneficiaries. However, this will create problems for area matching only to the extent that local environments that support employment for SSI beneficiaries differ from those that support employment for the SSDI beneficiaries. The primary purpose of including the predemonstration employment rate in the area-matching model is to find counties that are similarly conducive to employment for disabled beneficiaries. We have assumed that counties that have higher rates of employment for SSI beneficiaries will also have higher employment rates for the SSDI beneficiaries in that county. Thus, it is the relative ranking of counties in terms of their employment that matters, not the specific estimates. We feel that comparisons among counties are likely to be less sensitive to our focus on SSI beneficiaries than the specific coefficient estimates in the regression models. Nevertheless, it is possible that the exclusion of SSDI beneficiaries from the regressions will lead us to assign weights to area characteristics that fail to capture fully the relative importance of those characteristics in determining employment among SSDI beneficiaries. We have therefore identified an alternative set of comparison counties that are selected by giving each

5As noted earlier, we dropped one of each pair of variables that had a correlation coefficient greater than 0.70. Thus, the models included fewer than 13 area characteristics.

30

II. Selection of Comparison Areas

characteristics an equal weight. These counties provide a means for the final evaluation to test the sensitivity of net outcome estimates to the specific comparison counties used.

We used data from the REMICS files for June 1999 to measure the dependent variable,

whether or not an SSI beneficiary was employed during the month (measured as the presence of reported earnings in the SSA records for that month). That REMICS file was also used to measure the other individual-level independent variables, except for employment during the three preceding months (March through May 1999), which was measured using data from the REMICS files for those months. We use the REMICS files because they include all earnings, even earnings below the $65 per month disregard and earnings that have been sheltered through the use of a work incentive such as an Impairment Related Work Expense (IRWE) or Plan for Achieving Self Sufficiency (PASS).

We standardized all the area characteristics variables to have a zero mean and standard

deviation of 1, to ensure that they were measured using comparable scales. We then used the relative size of the regression coefficients for the standardized area characteristics as the weights for matching demonstration and comparison areas using the otherwise nonparametric approach. As discussed in the next section, we estimated the regression model separately for each State Project and its potential comparison areas, to allow for differences in the effect of area characteristics. The sample used for the regression included all beneficiaries in the demonstration state plus, in the case of statewide Projects, beneficiaries from the neighboring states from which comparison counties were to be selected (Table II.1).

The 13 State Projects for which we selected comparison counties operate in areas

characterized by a wide array of county characteristics (Table II.2). Not only do the mean levels of county characteristics vary among states, but there is also considerable variation within states. For example, county-level unemployment rates in California ranged from 2 percent to 25 percent in June 1999 (25 percent of California’s 58 counties had unemployment rates below 3.4, and 25 percent had rates above 11.5 percent). There is also considerable variation in the characteristics of the SSI beneficiaries in the 13 states. For example, the average employment rate among SSI beneficiaries from March to May of 1999 ranged from almost 28 percent in Iowa to just under 8 percent in North Carolina.

Many of the 13 county-level characteristics were highly correlated, particularly in states like

New York and Illinois, where a large metropolitan area differed from the rest of the state with respect to several characteristics. For example, in Illinois the variable measuring the size of the county labor force was almost perfectly correlated with the variables for population density, the availability of public transportation, and the percentage of the county devoted to farming. More broadly, poverty rates were highly correlated with the percentage of the population that was nonwhite or Hispanic and in many counties with the rate of population growth.

As a result, we excluded from the logit model one variable from each pair that was highly

correlated. We used a correlation coefficient of 0.70 as the threshold. This led us to eliminate the variable for the size of the county labor force from the logit models used for all of the State Projects. We also dropped additional variables from the models used for 11 of the 13 states

TA

BL

E I

I.2

C

HA

RA

CT

ER

IST

ICS

USE

D I

N T

HE

CO

UN

TY

MA

TC

HIN

G P

RO

CE

SS

M

eans

(S

tand

ard

Dev

iati

ons)

Var

iabl

es

Cal

ifor

nia

Illin

ois

Iow

a (S

SA

) M

inne

sota

a

New

H

amps

hire

N

ew M

exic

o a

New

Yor

k N

orth

Car

olin

a O

hio

Okl

ahom

a U

tah a

V

erm

ont a

W

isco

nsin

a

Are

a C

hara

cter

isti

cs

Po

pula

tion

Den

sity

75

6.6

(975

.6)

2647

(

2610

.3)

55.5

(

63.3

) 10

7.3

(38

9.6)

27

8.7

(15

6.3)

46

.0

(11

6.0)

26

489.

1 (

1899

2.3)

73

8.7

(32

6.7)

11

04.3

(

755.

0)

389.

7 (

433.

5)

75.5

(

214.

0)

63.6

(

54.8

) 14

5.7

(46

9.5)

Po

pula

tion

Gro

wth

13

(6

.9)

7.7

(6.

3)

2.4

(6.

4)

5.9

(12

.7)

8.2

(4.

5)

14.3

(

21.8

) 0.

04

(3.

4)

32.2

(

7.7)

3.

9 (

4.6)

5.

2 (

2.9)

24

.2

(18

.1)

6.2

(5.

3)

9.1

(6.

1)

Une

mpl

oym

ent

Rat

e 7.

7 (6

.9)

3.7

(0.1

) 2.

6 (

0.6)

4.

0 (

1.8)

3.

1 (

0.5)

6.

0 (

2.3)

7.

1 (

1.7)

1.

8 (

0.4)

3.

5 (

0.8)

3.

8 (

1.6)

5.

5 (

1.9)

3.

9 (

1.3)

4.

3 (

1.5)

U

nem

ploy

men

t V

olat

ility

45

.9

(2.1

) 24

.3

(7.9

) 11

4.4

(53

.3)

150.

7 (

69.7

) 55

.8

(13

.2)

66.6

(

19.8

) 33

.2

(12

.6)

58.5

(

5.6)

68

.9

(50

.4)

90.8

(

53.4

) 11

0.0

(10

6.8)

11

4.2

(92

.1)

102.

0 (

48.2

) T

otal

Em

ploy

men

t 31

6.9

(99.

5)

1050

.6

(131

3.2)

20

.4

(30

.0)

30.0

(

78.1

) 12

6.5

(81

.4)

37.2

(

75.2

) 69

0.1

(24

3.0)

33

9.2

(9.

6)

305.

2 (

248.

3)

136.

9 (

150.

9)

35.4

(

88.4

) 23

.0

(20

.6)

39.3

(

65.3

) A

nnua

l Gro

wth

in L

abor

For

ce

0.2

(2.1

) 0.

7 (1

.2)

-0.8

(

2.4)

-2

.9

(3.

5)

0.8

(2.

4)

-4.7

(

2.7)

-0

.1

(1.

9)

3.5

(0.

7)

0.2

(1.

7)

6.2

(4.

8)

-0.1

(

4.7)

0.

4 (

2.6)

-6

.5

(3.

5)

Per

cent

of

Cou

nty

Lan

d in

Fa

rmin

g

35.1

(2

7.7)

32

.8

(44

.3)

87.2

(

6.5)

62

.9

(28

.2)

8.0

(1.

3)

73.5

(

22.1

)

4.3

(9.

6)

14.9

(

8.9)

28

.9

(6.

5)

57.7

(

19.3

) 24

.9

(21

.8)

22.7

(

13.1

) 46

.2

(24

.3)

Pre

senc

e of

Sub

stan

tial

M

anuf

actu

ring

50

.0

(70.

7)

66.7

(5

7.7)

50

.0

(51.

9)

71.3

(4

5.5)

66

.7

(57.

7)

25.0

(4

5.2)

1.

0 (0

.0)

0.0

(0.0

) 25

.0

(50.

0)

71.4

(4

8.8)

37

.9

(49.

4)

57.1

(5

1.4)

67

.1

(47.

3)

Publ

ic T

rans

port

atio

n U

se

4.2

(4.5

) 9.

3 (

9.1)

0.

9 (

2.0)

0.

7 (

1.2)

0.

6 (

0.3)

0.

8 (

1.2)

45

.2

(22

.7)

2.6

(1.

4)

2.6

(1.

4)

0.5

(0.

4)

0.8

(0.

8)

0.5

(0.

4)

0.8

(1.

1)

Pove

rty

Rat

e 13

.6

(10.

0)

9.1

(5.7

) 8.

7 (

1.1)

10

.1

(3.

3)

6.4

(1.

1)

21.8

(

5.0)

22

.7

(7.

8)

9.4

(2.

2)

10.8

(

1.7)

19

.6

(3.

4)

12.3

(

5.0)

12

.2

(2.

4)

9.2

(3.

6)

Perc

ent o

f Po

pula

tion

His

pani

c 30

.2

(11.

7)

10.0

(9

.4)

1.9

(1.4

) 2.

2 (2

.3)

1.7

(1.3

) 42

.0

(14.

1)

15.3

(1

6.9)

6.

0 (7

.8)

2.2

(1.7

) 4.

1 (2

.4)

5.5

(2.9

) 0.

8 (0

.3)

1.7

(1.7

) P

erce

nt o

f P

opul

atio

n N

on-W

hite

50

.4

(0.2

) 29

.0

(20.

7)

4.1

(2.7

) 6.

4 (5

.7)

4.8

(2.6

) 48

.6

(13.

1)

36.7

(2

4.7)

34

.5

(6.2

) 20

.1

(9.4

) 29

.3

(9.7

) 11

.0

(10.

3)

3.5

(0.9

) 7.

0 (1

1.2)

S

SI

Em

ploy

men

t R

ate

11.9

(8

.1)

9.9

(4.9

) 31

.7

(7.5

) 28

.1

(9.7

) 15

.2

(1.0

) 6.

9 (3

.9)

9.3

(7.5

) 8.

8 (0

.0)

10.1

(2

.1)

9.0

(3.4

) 13

.7

(10.

6)

11.3

(2

.8)

24.9

(7

.7)

Indi

vidu

al C

hara

cter

isti

cs b

M

ale

0.53

0 (

0.49

9)

0.52

2 (

0.50

0)

0.49

2 (

0.50

0)

0.51

2 (

0.50

0)

0.48

0 (

0.50

0)

0.49

5 (

0.50

0)

0.51

7 (

0.50

0)

0.47

6 (

0.50

0)

0.49

5 (

0.50

0)

0.48

4 (

0.50

0)

0.49

5 (

0.50

0)

0.50

7 (

0.50

0)

0.51

0 (

0.50

0)

Rec

eipt

of

SSD

I B

enef

its

0.35

8 (

0.48

0)

0.24

8 (

0.43

2)

0.42

8 (

0.49

5)

0.39

3 (

0.48

1)

0.42

7 (

0.49

5)

0.33

8 (

0.47

3)

0.34

1 (

0.47

4)

0.42

4 (

0.49

4)

0.29

1 (

0.45

4)

0.34

8 (

0.47

6)

0.34

7 (

0.47

6)

0.37

0 (

0.48

3)

0.35

0 (

0.47

7)

Rec

eipt

of

Oth

er U

near

ned

Inco

me

0.00

8

(0.0

88)

0.00

9 (

0.09

2)

0.00

3 (

0.05

5)

0.00

6 (

0.07

7)

0.01

1 (

0.10

4)

0.00

7 (

0.08

2)

0.01

8 (

0.13

4)

0.00

4 (

0.06

1)

0.00

8 (

0.08

8)

0.00

5 (

0.07

2)

0.00

8 (

0.08

7)

0.01

6 (

0.12

5)

0.00

5 (

0.07

2)

Rac

e/E

thni

city

Asi

an

0.05

7 (

0.23

2)

0.01

0 (

0.09

9)

0.00

7 (

0.08

7)

0.01

4 (

0.11

8)

0.00

5 (

0.07

2)

0.00

8 (

0.08

8)

0.01

5 (

0.12

2)

0.00

4 (

0.06

7)

0.00

4 (

0.06

5)

0.14

6 (

0.35

3)

0.01

0 (

0.10

2)

0.01

6 (

0.12

6)

0.00

9 (

0.09

2)

Bla

ck

0.17

7

(0.3

82)

0.45

9 (

0.49

8)

0.07

0 (

0.25

6)

0.24

8 (

0.43

2)

0.01

3 (

0.11

2)

0.09

6 (

0.29

5)

0.23

1 (

0.42

2)

0.43

5 (

0.49

6)

0.27

4 (

0.44

6)

0.01

4 (

0.11

7)

0.06

8 (

0.25

2)

0.17

0 (

0.37

5)

0.26

5 (

0.44

1)

His

pani

c 0.

120

(0

.325

) 0.

057

(0.

232)

0.

010

(0.

100)

0.

014

(0.

116)

0.

012

(0.

110)

0.

109

(0.

311)

0.

127

(0.

333)

0.

005

(0.

072)

0.

013

(0.

113)

0.

052

(0.

222)

0.

092

(0.

289)

0.

104

(0.

305)

0.

018

(0.

132)

N

ativ

e A

mer

ican

0.

008

(0

.091

) 0.

002

(0.

044)

0.

005

(0.

069)

0.

028

(0.

164)

0.

004

(0.

067)

0.

092

(0.

289)

0.

004

(0.

059)

0.

015

(0.

123)

0.

002

(0.

047)

0.

020

(0.

140)

0.

081

(0.

273)

0.

004

(0.

062)

0.

018

(0.

134)

O

ther

0.

033

(0

.179

) 0.

008

(0.

088)

0.

004

(0.

064)

0.

016

(0.

126)

0.

004

(0.

064)

0.

054

(0.

227)

0.

024

(0.

154)

0.

007

(0.

085)

0.

003

(0.

057)

0.

118

(0.

323)

0.

047

(0.

212)

0.

019

(0.

136)

0.

012

(0.

109)

U

nkno

wn

0.

119

(0

.323

) 0.

107

(0.

309)

0.

125

(0.

330)

0.

123

(0.

329)

0.

138

(0.

345)

0.

122

(0.

327)

0.

158

(0.

365)

0.

111

(0.

314)

0.

112

(0.

315)

0.

031

(0.

172)

0.

126

(0.

331)

0.

157

(0.

363)

0.

117

(0.

321)

S

ex a

nd A

ge I

nter

actio

ns

Fe

mal

e, A

ge <

18

0.01

9 (

0.13

8)

0.05

1 (

0.21

9)

0.02

6 (

0.16

0)

0.03

0 (

0.17

0)

0.03

0 (

0.17

1)

0.03

0 (

0.17

0)

0.02

5 (

0.15

5)

0.03

3 (

0.17

9)

0.03

6 (

0.18

5)

0.04

3 (

0.20

3)

0.02

9 (

0.16

7)

0.02

3 (

0.15

0)

0.03

6 (

0.18

7)

Fem

ale,

Age

18

– 24

0.

037

(0.

188)

0.

051

(0.

222)

0.

064

(0.

245)

0.

057

(0.

231)

0.

069

(0.

253)

0.

045

(0.

207)

0.

043

(0.

203)

0.

045

(0.

210)

0.

057

(0.

231)

0.

075

(0.

263)

0.

055

(0.

228)

0.

045

(0.

208)

0.

057

(0.

231)

Fe

mal

e, A

ge 2

5 -

34

0.06

6 (

0.24

8)

0.07

4 (

0.26

3)

0.09

5 (

0.29

3)

0.08

8 (

0.28

4)

0.09

6 (

0.29

5)

0.07

3 (

0.26

1)

0.07

5 (

0.26

3)

0.07

5 (

0.26

4)

0.08

9 (

0.28

4)

0.11

8 (

0.32

3)

0.08

2 (

0.27

4)

0.08

2 (

0.27

4)

0.08

7 (

0.28

2)

31

TA

BL

E I

I.2

(Con

tinu

ed)

M

eans

(S

tand

ard

Dev

iati

ons)

Var

iabl

es

Cal

ifor

nia

Illin

ois

Iow

a (S

SA

) M

inne

sota

a

New

H

amps

hire

N

ew M

exic

o a

New

Yor

k N

orth

Car

olin

a O

hio

Okl

ahom

a U

tah a

V

erm

ont a

W

isco

nsin

a

Fem

ale,

Age

35

- 44

0.

112

(0.

315)

0.

111

(0.

315)

0.

125

(0.

331)

0.

120

(0.

325)

0.

132

(0.

339)

0.

114

(0.

317)

0.

110

(0.

312)

0.

118

(0.

323)

0.

129

(0.

335)

0.

122

(0.

328)

0.

115

(0.

318)

0.

118

(0.

322)

0.

119

(0.

324)

Fe

mal

e, A

ge 4

5 -

54

0.12

1

(0.3

26)

0.10

3 (

0.30

4)

0.11

0 (

0.31

3)

0.10

8 (

0.31

1)

0.10

9 (

0.31

2)

0.11

8 (

0.32

3)

0.11

5 (

0.31

9)

0.13

0 (

0.33

6)

0.11

1 (

0.31

4)

0.05

2 (

0.22

1)

0.11

4 (

0.31

8)

0.11

6 (

0.32

0)

0.10

7 (

0.30

9)

Mal

e, A

ge <

18

0.03

1

(0.1

75)

0.08

8 (

0.28

4)

0.04

6 (

0.21

0)

0.05

1 (

0.22

0)

0.04

5 (

0.20

8)

0.04

9 (

0.21

6)

0.04

3 (

0.20

4)

0.05

8 (

0.23

4)

0.06

2 (

0.24

2)

0.06

2 (

0.24

1)

0.04

8 (

0.21

4)

0.04

1 (

0.19

9)

0.06

4 (

0.24

4)

Mal

e, A

ge 1

8 -

24

0.05

4 (

0.22

7)

0.07

2 (

0.25

9)

0.08

6 (

0.28

0)

0.07

5 (

0.26

4)

0.08

2 (

0.27

5)

0.06

2 (

0.24

2)

0.06

2 (

0.24

2)

0.06

3 (

0.24

2)

0.07

3 (

0.26

1)

0.08

8 (

0.28

4)

0.07

5 (

0.26

3)

0.06

3 (

0.24

2)

0.07

6 (

0.26

5)

Mal

e, A

ge 2

5 -

34

0.09

2

(0.2

90)

0.08

8 (

0.28

3)

0.10

3 (

0.30

3)

0.09

9 (

0.30

0)

0.10

1 (

0.30

2)

0.08

7 (

0.28

2)

0.09

5 (

0.29

3)

0.07

6 (

0.26

4)

0.09

4 (

0.29

2)

0.10

7 (

0.30

9)

0.09

22

(0.

289)

0.

097

(0.

296)

0.

098

(0.

297)

M

ale,

Age

35

- 44

0.

139

(0.

346)

0.

120

(0.

324)

0.

114

(0.

318)

0.

127

(0.

333)

0.

109

(0.

312)

0.

109

(0.

312)

0.

126

(0.

332)

0.

100

(0.

300)

0.

123

(0.

329)

0.

088

(0.

283)

0.

107

(0.

309)

0.

125

(0.

331)

0.

123

(0.

328)

M

ale,

Age

45

- 54

0.

120

(0

.325

) 0.

087

(0.

282)

0.

077

(0.

267)

0.

093

(0.

291)

0.

079

(0.

270)

0.

095

(0.

293)

0.

102

(0.

303)

0.

094

(0.

292)

0.

083

(0.

276)

0.

009

(0.

093)

0.

089

(0.

285)

0.

100

(0.

300)

0.

086

(0.

281)

P

rim

ary

Dis

ablin

g C

ondi

tion

Infe

ctio

ns a

nd P

aras

itic

0.

020

(0.

138)

0.

015

(0.

121)

0.

006

(0.

075)

0.

008

(0.

088)

0.

008

(0.

091)

0.

011

(0.

105)

0.

039

(0.

194)

0.

017

(0.

128)

0.

008

(0.

087)

0.

009

(0.

093)

0.

011

(0.

104)

0.

033

(0.

177)

0.

008

(0.

087)

N

eopl

asm

s 0.

014

(0.

119)

0.

013

(0.

115)

0.

015

(0.

121)

0.

013

(0.

113)

0.

013

(0.

114)

0.

018

(0.

133)

0.

016

(0.

125)

0.

018

(0.

133)

0.

013

(0.

113)

0.

018

(0.

134)

0.

018

(0.

132)

0.

015

(0.

123)

0.

013

(0.

115)

E

ndoc

rine

and

Met

abol

ic

0.03

5

(0.1

84)

0.04

3 (

0.20

3)

0.04

1 (

0.19

7)

0.03

8 (

0.19

0)

0.03

0 (

0.17

0)

0.05

6 (

0.23

0)

0.04

8 (

0.21

4)

0.07

4 (

0.26

2)

0.05

5 (

0.22

7)

0.05

6 (

0.23

0)

0.05

0 (

0.21

8)

0.04

4 (

0.20

5)

0.04

3 (

0.20

3)

Schi

zoph

reni

a 0.

138

(0.

345)

0.

101

(0.

301)

0.

075

(0.

263)

0.

100

(0.

300)

0.

086

(0.

281)

0.

079

(0.

269)

0.

114

(0.

318)

0.

064

(0.

244)

0.

088

(0.

284)

0.

083

(0.

275)

0.

086

(0.

280)

0.

101

(0.

302)

0.

097

(0.

296)

Ps

ycho

ses

and

Neu

rose

s 0.

216

(0.

411)

0.

254

(0.

436)

0.

198

(0.

398)

0.

230

(0.

420)

0.

286

(0.

452)

0.

187

(0.

390)

0.

191

(0.

393)

0.

149

(0.

356)

0.

246

(0.

431)

0.

142

(0.

349)

0.

218

(0.

413)

0.

300

(0.

421)

0.

219

(0.

414)

M

enta

l Ret

arda

tion

0.13

7

(0.3

44)

0.26

9 (

0.44

3)

0.30

4 (

0.46

0)

0.26

4 (

0.44

1)

0.22

7 (

0.41

9)

0.20

1 (

0.40

0)

0.18

2 (

0.38

7)

0.27

1 (

0.44

4)

0.30

4 (

0.46

0)

0.26

0 (

0.43

9)

0.19

4 (

0.39

6)

0.17

6 (

0.38

1)

0.27

6 (

0.44

7)

Dis

ease

of:

Cen

tral

ner

vous

sys

tem

0.

037

(0

.189

) 0.

031

(0.

174)

0.

044

(0.

205)

0.

036

(0.

187)

0.

051

(0.

219)

0.

046

(0.

210)

0.

030

(0.

172)

0.

033

(0.

178)

0.

025

(0.

156)

0.

046

(0.

210)

0.

048

(0.

213)

0.

032

(0.

176)

0.

038

(0.

190)

C

ircu

lato

ry S

yste

m

0.03

8

(0.1

92)

0.03

3 (

0.17

8)

0.03

1 (

0.17

3)

0.03

2 (

0.17

8)

0.03

0 (

0.17

0)

0.04

6 (

0.21

0)

0.04

7 (

0.21

2)

0.05

3 (

0.22

4)

0.02

7 (

0.16

3)

0.05

2 (

0.22

2)

0.03

9 (

0.19

5)

0.04

1 (

0.19

8)

0.03

1 (

0.17

5)

Res

pira

tory

Sys

tem

0.

014

(0.

118)

0.

017

(0.

131)

0.

017

(0.

131)

0.

014

(0.

116)

0.

018

(0.

132)

0.

022

(0.

147)

0.

021

(0.

142)

0.

024

(0.

152)

0.

015

(0.

121)

0.

027

(0.

162)

0.

020

(0.

139)

0.

018

(0.

134)

0.

015

(0.

123)

D

iges

tive

Sys

tem

0.

009

(0.

095)

0.

005

0.07

1 0.

006

0.07

4 0.

005

0.07

3 0.

007

(0.

083)

0.

011

(0.

103)

0.

006

(0.

078)

0.

009

(0.

093)

0.

005

(0.

068)

0.

008

(0.

088)

0.

010

(0.

099)

0.

006

(0.

078)

0.

005

0.07

4 G

enito

urin

ary

Sys

tem

0.

011

(0.

104)

0.

010

0.09

9 0.

007

0.08

6 0.

008

0.08

9 0.

005

(0.

072)

0.

013

(0.

113)

0.

009

(0.

096)

0.

011

(0.

106)

0.

008

(0.

087)

0.

010

(0.

098)

0.

012

(0.

109)

0.

008

(0.

087)

0.

008

0.09

1 Sk

in a

nd S

ubcu

tane

ous

0.00

1

(0.0

33)

0.00

1 0.

025

0.00

1 0.

027

0.00

1 0.

027

0.00

1 (

0.02

3)

0.00

1 (

0.03

5)

0.00

1 (

0.03

1)

0.00

1 (

0.03

6)

0.00

03

(0.

019)

0.

001

(0.

035)

0.

001

(0.

032)

0.

001

(0.

031)

0.

001

0.02

6 M

uscu

losk

elat

al

0.07

4 (

0.26

1)

0.03

6 0.

187

0.06

4 0.

245

0.04

6 0.

210

0.04

9 (

0.21

5)

0.09

8 (

0.30

0)

0.05

8 (

0.23

3)

0.07

5 (

0.26

4)

0.03

3 (

0.17

9)

0.08

7 (

0.28

2)

0.08

4 (

0.27

8)

0.06

4 (

0.24

4)

0.04

4 0.

205

Con

geni

tal A

nom

alie

s 0.

004

(0.

064)

0.

004

0.06

1 0.

005

0.06

9 0.

004

0.06

4 0.

005

(0.

069)

0.

005

(0.

072)

0.

004

(0.

060)

0.

004

(0.

063)

0.

004

(0.

064)

0.

005

(0.

070)

0.

006

(0.

074)

0.

004

(0.

063)

0.

005

0.06

8 O

ther

0.

005

(0

.072

) 0.

008

0.09

0 0.

005

0.07

2 0.

006

0.07

7 0.

004

(0.

062)

0.

010

(0.

100)

0.

006

(0.

080)

0.

007

(0.

082)

0.

005

(0.

073)

0.

025

(0.

156)

0.

006

(0.

080)

0.

006

(0.

076)

0.

007

0.08

1 In

jury

0.

026

(0

.160

) 0.

016

0.12

5 0.

019

0.13

7 0.

018

0.13

2 0.

024

(0.

154)

0.

031

(0.

173)

0.

017

(0.

129)

0.

021

(0.

143)

0.

013

(0.

113)

0.

025

(0.

156)

0.

030

(0.

172)

0.

018

(0.

133)

0.

017

0.13

0 E

ye o

r E

ar

0.02

9

(0.1

69)

0.02

3 0.

150

0.02

4 0.

152

0.02

1 0.

145

0.02

0 (

0.14

1)

0.03

3 (

0.17

8)

0.02

5 (

0.15

5)

0.02

4 (

0.15

3)

0.01

9 (

0.13

8)

0.03

1 (

0.17

2)

0.03

1 (

0.17

3)

0.02

4 (

0.15

3)

0.02

2 0.

146

Em

ploy

ed in

pri

or 3

Mon

ths

0.09

6 (0

.294

) 0.

094

0.29

1 0.

275

0.44

6 0.

180

0.38

4 0.

173

(0.

378)

0

.082

(0.2

74)

0.1

18

(0.

323)

0.

077

(0.

267)

0.

118

(0.

322)

0.

088

(0.

283)

0.

105

(0.

307)

0.

123

(0.

328)

0.

167

0.37

3 S

OU

RC

E:

RE

MIC

S f

iles

for

the

mon

ths

Mar

ch 1

999

thro

ugh

June

199

9.

a Fig

ures

for

thes

e st

ates

are

bas

ed o

n al

l cou

ntie

s in

the

dem

onst

ratio

n st

ate

and

sele

cted

com

pari

son

stat

es.

Tab

le I

I.1

indi

cate

s th

e sp

ecif

ic c

ompa

riso

n st

ates

incl

uded

for

the

dem

onst

ratio

n st

ate.

b S

tati

stic

s ab

out i

ndiv

idua

l cha

ract

eris

tics

sho

w th

e fr

acti

on o

f th

e po

pula

tion

for

each

of

the

char

acte

rist

ics.

32

33

II. Selection of Comparison Areas

(Table II.3 indicates which variables were included for each of the state-specific models). The number of characteristics included directly in the models ranged from 5 to 11, with 9 of the state models including 9 or more variables.

The models have very high predictive power, primarily because of the extraordinary stability

of employment status over time. In particular, almost 90 percent of the 781,000 SSI beneficiaries included in our analyses were not employed in any month from March through June 1999 (Table II.4). In addition, about 10 percent were employed throughout those months. Less than 2 percent changed employment status throughout those months. As a result, models that include a variable measuring individual-level employment history will be very accurate in predicting employment in a subsequent month.

The small estimated size of the marginal effects for the county characteristics suggests that

they exert little effect on the probability that an SSI beneficiary will be employed, once we have controlled for individual characteristics (Table II.3). Furthermore, few of the county characteristics (with the notable exception of the predemonstration SSI employment rate) are statistically significant. As noted, once we have included whether a beneficiary had been employed at some time during the previous three months, there is often little additional variation to explain. The small marginal effects indicate that differences in county characteristics, such as unemployment or poverty rates, would have to be very large before we would be likely to notice any difference in the employment of similar SSI beneficiaries. For example, the estimated marginal effect for the unemployment rate in California, –0.003, suggests that in a county where unemployment was higher by a standard deviation (6.9 percentage points, as shown in Table II.3) the probability that an SSI beneficiary would be employed would increase by only 0.3 percentage points (after controlling for individual characteristics). Thus, while the county characteristics appear to play some role in whether SSI beneficiaries gain employment, individual characteristics (particularly, whether a person had prior employment) dominate.

The regression-based weighting process tends to emphasize a few characteristics for each

State Project. In 9 of the 13 projects, the estimated coefficients place more than 60 percent of the weight on three or fewer characteristics. For example, in New York, the poverty rate alone gets a weight of 55.2 percent in the area matching, with the county unemployment rate getting an additional 13.5 percent of the total weight. Thus, while the approach started with 13 characteristics, the final selection for most states is heavily influenced by only 3 or so characteristics. This is not necessarily problematic, particularly since the factors that get the greatest weight, SSI beneficiary employment rate, overall unemployment rate, and poverty rate, have a lot of intuitive appeal.

The predemonstration SSI employment rate generally received the greatest weight among

the area characteristics (Table II.3). It received a weight of more that 20 percent in the matching process for 9 of the 13 states (which constitute all the states where this variable was included in the model because it was not highly correlated with another variable in the model). The overall county unemployment rate was also important in several states, receiving a weight of at least 10 percent in 5 states, while unemployment volatility received such a weight in 4 states. County poverty rate also received a lot of weight in many states (for example, more than 50 percent in New York, Ohio, and Vermont), but a very low weight in Minnesota, North Carolina, and Utah.

TA

BL

E I

I.3

M

AR

GIN

AL

EF

FE

CT

S A

ND

WE

IGH

TS

FO

R C

OU

NT

Y C

HA

RA

CT

ER

IST

ICS

RE

LA

TE

D T

O S

SI

BE

NE

FIC

IAR

Y E

MP

LO

YM

EN

T

Sta

te

Pop

ulat

ion

Den

sity

P

opul

atio

n G

row

th

Une

mpl

oy-

men

t Rat

e

Une

mpl

oy-

men

t V

olat

ilit

y

Ann

ual

Gro

wth

in

Lab

or F

orce

P

erce

nt

Far

min

g P

erce

nt M

anu-

fact

urin

g P

ublic

Tra

n-sp

orta

tion

Pov

erty

Rat

e P

erce

nt

His

pani

c P

erce

nt N

on

Whi

te

SS

I E

mpl

oym

ent

Rat

e C

alif

orni

a 0.

0000

06

0.00

0282

-0

.002

74**

* 0.

0009

75**

* 0.

0003

89

0.00

0125

-0

.000

22

0.

0016

5***

0.

0024

6***

0.1

3.2

31.0

11

.0

4.4

1.4

2.5

0.0

18.7

0.

0 0.

0 27

.8

Illi

nois

0.

0009

16

-0.0

0052

-0

.001

65

-0.0

0062

-0

.000

18

-0

.001

53

0.

0011

13

0.00

1601

0.00

3716

***

7.

7 4.

4 13

.9

5.2

1.5

0.0

12.9

0.

0 9.

4 13

.5

0.0

31.4

Io

wa

-0.0

032

0.00

0747

-0

.003

12

0.00

3415

-0

.001

29

-0

.003

16

-0.0

0085

-0

.001

7

0.

0071

16

13

.0

3.0

12.7

13

.9

5.2

0.0

12.8

3.

5 6.

9 0.

0 0.

0 28

.9

Min

neso

taa

-0.0

0159

-0

.001

32

-0.0

0152

0.

0002

55

-0.0

0153

-0

.002

5*

0.00

0148

0.00

0472

0.

0016

22*

0.

0081

2***

8.3

6.9

8.0

1.3

8.0

13.1

0.

8 0.

0 2.

5 8.

5 0.

0 42

.6

0.01

0971

**

0.00

7724

* 0.

0063

2

-0.0

0543

0.

0084

34**

N

ew

Ham

pshi

re

28.2

19

.9

16.3

0.

0 0.

0 0.

0 0.

0 0.

0 0.

0 0.

0 14

.0

21.7

-0

.000

12

0.00

0416

-0

.001

44**

* -0

.000

26

-0.0

0071

0.

0004

62

0.00

0076

0.

0002

5 -0

.000

37

0.00

0443

0.00

1858

***

New

M

exic

oa 1.

9 6.

5 22

.5

4.1

11.1

7.

2 1.

2 3.

9 5.

8 6.

9 0.

0 29

.0

New

Yor

k

0.00

0778

0.

0010

15

0.

0007

33

0.00

0847

-0

.000

01

-0

.004

16**

*

0.

0 10

.3

13.5

0.

0 9.

7 11

.2

0.1

0.0

55.2

0.

0 0.

0 0.

0 0.

0014

39**

* 0.

0002

94

-0

.000

06

-0.0

0025

-0

.000

06

0.00

0033

0.00

0027

0.

0012

21**

*

0.00

2006

***

Nor

th

Car

olin

a 26

.7

5.5

0.0

1.1

4.6

1.1

0.6

0.0

0.5

22.7

0.

0 37

.2

Ohi

o 0.

0006

38

-0.0

0032

-0

.000

36

-0.0

0056

0.

0002

32

0.

0000

51

-0

.002

61**

* -0

.000

32

12

.5

6.3

7.1

11.0

4.

6 0.

0 1.

0 0.

0 51

.3

6.3

0.0

0.0

Okl

ahom

a -0

.000

52

-0.0

009

-0.0

0236

-0

.000

9 -0

.000

14

-0

.000

61

0.

0001

59

-0.0

0031

0.00

1764

6.8

11.7

30

.8

11.7

1.

8 0.

0 8.

0 0.

0 2.

1 4.

0 0.

0 23

.0

Uta

ha -0

.000

55

0.00

0985

-0

.001

23

-0.0

0053

-0

.002

44**

0.

0007

93

0.00

0461

-0

.000

12

-0.0

0025

0.

0001

22

0.

0051

36**

*

4.4

7.8

9.7

4.2

19.3

6.

3 3.

7 1.

0 2.

0 1.

0 0.

0 40

.7

Ver

mon

ta

-0.0

0033

-0.0

0005

-0

.000

85**

0.

0014

4***

-0

.000

9**

-0

.003

66**

*

0.

0 4.

6 0.

0 0.

7 11

.8

19.9

12

.4

0.0

50.6

0.

0 0.

0 0.

0 W

isco

nsin

a -0

.000

05

-0.0

001

-0.0

0022

0.

0001

11

-0.0

0008

-0.0

0031

-0.0

0068

0.

0032

25**

*

1.0

2.1

4.6

2.3

1.7

0.0

6.5

0.0

14.2

0.

0 0.

0 67

.5

S

OU

RC

E:

Log

it r

egre

ssio

ns b

ased

on

RE

MIC

S d

ata

from

Mar

ch t

hrou

gh J

une

1999

. B

lank

cel

ls i

ndic

ate

that

the

var

iabl

e w

as e

xclu

ded

from

tha

t st

ate’

s re

gres

sion

bec

ause

it

was

hig

hly

corr

elat

ed w

ith

anot

her

vari

able

inc

lude

d in

the

est

imat

ion

equa

tion.

A

ll b

ut f

ive

vari

able

s w

ere

excl

uded

fro

m t

he N

ew H

amps

hire

equ

atio

n be

caus

e it

has

onl

y 11

cou

ntie

s,

incl

udin

g th

e 3

coun

ties

in w

hich

the

Sta

te P

roje

ct o

pera

te.

NO

TE:

The

top

num

ber

wit

hin

each

cel

l is

the

est

imat

ed m

argi

nal

effe

ct o

f th

e as

soci

ated

are

a ch

arac

teri

stic

on

SS

I be

nefi

ciar

y em

ploy

men

t fo

r ea

ch s

tate

. T

he b

otto

m n

umbe

r is

the

w

eigh

t giv

en th

at c

hara

cter

isti

c w

hen

sele

ctin

g co

mpa

riso

n co

unti

es f

or th

at s

tate

’s d

emon

stra

tion

cou

ntie

s.

a In t

hese

sta

tes

the

logi

t re

gres

sion

was

run

usi

ng d

ata

from

the

dem

onst

rati

on s

tate

and

all

of t

he c

ompa

riso

n st

ates

sel

ecte

d fo

r th

at s

tate

(se

e T

able

II.

1).

In t

he o

ther

sta

tes,

the

reg

ress

ion

was

ru

n us

ing

data

fro

m a

ll c

ount

ies

in th

e de

mon

stra

tion

sta

te.

*

Sig

nifi

cant

ly d

iffe

rent

fro

m z

ero

at th

e .1

0 le

vel,

two-

tail

ed te

st.

**S

igni

fica

ntly

dif

fere

nt f

rom

zer

o at

the

.05

leve

l, tw

o-ta

iled

test

. **

*Sig

nifi

cant

ly d

iffe

rent

fro

m z

ero

at th

e .0

1 le

vel,

two-

tail

ed te

st.

34

35

II. Selection of Comparison Areas

TABLE II.4

SSI BENEFICIARIES’ EMPLOYMENT STATUS IN JUNE 1999 AND IN THE PREVIOUS THREE MONTHS

(Percentages)

State Unemployed in Both Periods

Employed in Both Periods Gained Job Lost Job Sample Size

California 89.7 8.8 0.7 0.7 261,871 Illinois 89.8 8.5 0.8 0.8 91,835 Iowa 70.6 25.6 1.9 1.9 17,549 Minnesota 73.1 23.8 1.7 1.4 24,589 New Hampshire 81.3 15.6 1.4 1.7 5,619 New Mexico 91.8 6.7 0.6 0.8 16,241 New York 87.3 11.0 0.9 0.9 148,590 North Carolina 91.5 6.9 0.8 0.8 63,909 Ohio 87.3 10.7 1.0 1.1 84,624 Oklahoma 90.7 8.2 0.6 0.6 24,908 Utah 81.2 15.8 1.8 1.3 7,800 Vermont 85.3 11.7 1.3 1.7 5,549 Wisconsin 76.5 20.5 1.4 1.7 31,901 13-State Total 87.6 10.6 0.9 0.9 784,985

SOURCE: Tabulation from monthly REMICS files for March 1999 through June 1999.

36

II. Selection of Comparison Areas

The percentage of commuters using public transportation was not estimated to play an important role and did not get a substantial weight for any state. This is likely to reflect correlation between the public transportation variable and several others (particularly population density) rather than a complete lack of effect for public transportation on employment. D. PRELIMINARY LIST OF COMPARISON COUNTIES

The matching process was used to select the 10 most similar comparison counties for each demonstration county in states that ran substate projects (Table II.1); 5 comparison counties were selected for counties in statewide projects. The lower number used for the statewide projects reflects the much larger number of demonstration counties in such projects. Comparison counties were selected with replacements. A total of 404 comparison counties were selected for the 251 demonstration counties. Khan et al. (2002) contains the full list of demonstration and selected comparison counties.

The results for California are intriguing because the sets of counties selected to match the

two demonstration counties do not overlap. This suggests that the two California demonstration counties had substantially different employment environments. In contrast, many of the comparison counties selected in North Carolina were chosen as a match for both demonstration sites of that State Project. Given the overlap between the two sets of comparison counties, it appears that selection of comparison beneficiaries from a single set of comparison counties would be possible.

For Chicago, New Hampshire, and New York, we felt that our standard approach was

inadequate. We instead made the final selection of comparison areas from within the labor market area where demonstration services are offered. For example, the Illinois project served students in only a few Chicago schools, so we worked with the staff from that project to identify other areas of Chicago that could be used for comparison. (The other areas the Illinois project serves could be matched effectively with our standard approach.) We did the same in New Hampshire, which concentrated services in specific towns within its counties.

Because New York recruited participants from throughout New York City, there were no

ZIP code areas in the city where we could be sure that beneficiaries had not been at least offered project services. Therefore, we worked with State Project staff to identify neighboring counties that would provide a close a match in terms of service environment and employment opportunities for people with disabilities. Two adjoining counties, Westchester and Nassau, were decided upon as comparison areas for New York City.

We intended to make the modified procedures for Chicago, New Hampshire, and New York

generally consistent with the overall comparison area selection process. That is, we identified the specific areas where demonstration services were fielded and then tried to identify similar areas from within the remaining parts of the cities or state. The major difference between this process and the one used elsewhere is that the general procedure relies first on statistical matching and then on State Project staff judgments, while the special procedures will rely more on discussions with the project staff.

37

II. Selection of Comparison Areas

E. FINAL SELECTION OF COMPARISON AREAS

The final step in the comparison area selection process was to ask State Project staff to review our initial selections. Project staff were given a list of their demonstration counties and the list of the 5 to 10 best matches for each demonstration county. The goal of this review was to pick up policy or environmental differences that cannot be measured well with available statistics. When State Project staff provided a clear reason that an area selected by the quantitative process was not a good match, we dropped that county from the set matched to the associated demonstration county.

In general, State Project staff rejected few of the initial selections. The following sections

provide details about how this review process changed our final selections.

1. California

Because the California State Project targets beneficiaries with mental illness, staff felt it was important that the comparison counties have a mental health service system similar to the one in the two demonstration counties (Kern and San Mateo). The key feature of that system was the presence of a Department of Mental Health and Department of Rehabilitation cooperative program. These cooperative programs provide a single source for employment and treatment assistance for county residents with severe mental illness. Project staff reviewed the initial list of 20 counties (10 for each of the two demonstration counties), identifying 3 for Kern County and 5 for San Mateo County that had cooperative centers similar to those in the demonstration counties. We used all three of the selected comparison counties for Kern. In the case of San Mateo, we used just 5 of the 8 selected; keeping the ones that were most similar to the demonstration county in terms of their characteristics (that is, they had the lowest weighted similarity scores).

2. Illinois

We used the general selection approach for matching the demonstration areas outside Chicago. After consulting with project staff, we identified 7 suitable comparison counties for those 4 demonstration counties. In Chicago, the project targeted specific schools, so we selected as the comparison area the Chicago ZIP code areas that lie outside the areas those schools serve.

3. Iowa (SSA)

The Iowa State Project adopted a method for selecting comparison areas that was similar to the one we developed. Project staff matched counties on several characteristics and selected the most closely matched counties as comparisons for each demonstration county. After reviewing our list of selected counties and the factors we used to select them, they came to the conclusion that, while our selections differed slightly from the ones they were using for their evaluation, both sets were reasonable matches. We included, as our final set of comparison counties, the five best matches (using our statistical matching method) for each of the demonstration counties.

38

II. Selection of Comparison Areas

Therefore, we have used the set of 41 unduplicated comparison counties we identified for Iowa’s 14 demonstration counties.

4. Minnesota

Because the SPI demonstration in Minnesota is statewide, we selected comparison areas from neighboring states (Michigan, Illinois, and nondemonstration sites in Iowa). State Project officials, while not familiar with the characteristics of particular counties within these states, agreed with our assumptions that the three will generate good matches for Minnesota counties. They particularly thought that Michigan, where most of the comparison counties were selected, would provide good matches. We included, as our final set of comparison counties, the five best matches (using our statistical matching method) for each of the demonstration counties. Therefore, we have used the set of 169 unduplicated comparison counties we identified for Minnesota’s 87 demonstration counties.

5. New Hampshire

The New Hampshire demonstration sites are in the cities of Keene (Cheshire County), Manchester (Hillsborough County), and Derry and Portsmouth (Rockingham County). Because New Hampshire has only 10 counties, we could not use our statistical matching approach (which uses up to 13 characteristics to select comparison counties for those 3 demonstration counties). We decided instead that our initial list should include the 7 nondemonstration counties and the portions of the 3 demonstration counties where services were not offered. State Project staff reviewed this initial list and suggested the following comparison areas for the four demonstration cities: Nashua (Hillsborough County), Salem (Rockingham County), Concord (Merrimack County), and Lebanon and Hanover (Grafton County).

6. New Mexico

While the New Mexico project targets specific areas within the state, those areas account for most of the state’s urban areas. Therefore, after discussions with State Project staff, we decided to draw the comparison counties from the neighboring states of Arizona and Nevada and the nondemonstration counties in Oklahoma. We included, as our final set of comparison counties, the five best matches (using our statistical matching method) for each of the demonstration counties. A total of 35 unduplicated counties were selected as comparison areas for the 12 demonstration counties.

7. New York

The standard approach proved to be somewhat problematic in New York, although we were able to select comparison counties for the two demonstration sites, Buffalo (Erie County) and New York City.

39

II. Selection of Comparison Areas

We used the standard approach for selecting comparison counties for New York’s Buffalo demonstration site (Erie County). That approach identified 10 possible matches for Erie County, and the State Project staff felt that one of those counties, Oneida, would be an acceptable match. The reason for rejecting the other nine initial selections was that they were much more rural than Erie County and therefore had different types of service systems. Those counties appear to have been selected because the regression-based characteristic weights used in New York placed a heavy emphasis on the county poverty and unemployment rates, but no weight on population density (Table II.3). Population density was excluded from the New York model because it was highly correlated with the poverty rate. We used Oneida County as the comparison area for Erie County, but also noted this selection was sensitive to the weights given to county characteristics. In particular, Oneida County would not have been on the list of counties selected if we had given equal weight to all characteristics, although several other counties that the State Project would have picked over Oneida were on that list. The problems in matching this county do not appear to create serious problems for the overall SPI evaluation, but should be kept in mind if this design were to be applied to other demonstrations.

We selected comparison sites for New York City on the basis of discussions with the

director of the New York project. While there is no perfect match for New York City, we decided to use Westchester and Nassau counties, which adjoin New York City and share much of the same labor market.

8. North Carolina

In North Carolina, project staff reviewed our initial list of 11 comparison counties that had been matched to their 2 demonstration counties (Mecklenburg and Wake). On that list, they identified two counties that were fielding services similar to those in the State Project. We dropped those counties from our list of comparison counties for that state. Specifically, we dropped Durham and Guildford because North Carolina was using the SSA-funded Benefits Counseling, Planning and Outreach Program to deliver services in those counties that were similar to key SPI services. Therefore, the experience of beneficiaries in these counties would not be indicative of what would have happened to beneficiaries in the demonstration areas in the absence of SPI. That left a total of nine possible unduplicated counties, from which we chose the five best matches (using our statistical matching method) to be used as comparison areas for the two demonstration counties in North Carolina.

9. Ohio

In Ohio, State Project staff reviewed our initial list of statistically matched comparison counties, the 10 best matches (using our statistical matching method) for each of the 4 demonstration counties. Because many of the comparison counties matched to more than 1 demonstration county, we reviewed a total of 29 unduplicated counties as possible matches. State Project staff reviewed the list and felt that only a few counties would serve as good comparison areas for the demonstration counties. Specifically, for Franklin County, only two counties (Hamilton and Summit) were selected as good matches by the state and by our statistical matching process. For the remaining demonstration counties, only one comparison

40

II. Selection of Comparison Areas

county was selected for each: for Lucas County, Mahoning; for Montgomery County, Summit; and for Portage, Licking. Thus, a total of four unduplicated counties were selected as comparison areas for the four demonstration counties.

10. Oklahoma

In Oklahoma, project staff reviewed our initial list of statistically matched comparison counties. For Oklahoma County, which contains Oklahoma City, they selected only Osage, Wagoner, and Rogers counties. The other seven counties we proposed are extremely rural and thus not comparable. State Project staff also suggested that since the demonstration is running only in northern Oklahoma City, the best match for their Oklahoma County site would be southern Oklahoma City, which can be identified using ZIP codes. The proposed comparison counties for the other demonstration counties were thought to be acceptable matches. We included, as our final set of comparison counties, the five best matches (using our statistical matching method) for each of the demonstration counties. Therefore, a total of 17 unduplicated counties, plus southern Oklahoma City, were selected as comparison areas for the 4 demonstration counties.

11. Utah

The Utah program is statewide, so we focused comparison area selection on the neighboring states of Arizona, Idaho, Nevada, and Wyoming. As with Minnesota, State Project staff were not familiar with the details of the service systems in these states but agreed with our assumption that they would provide a good source for matching Utah counties. We included, as our final set of comparison counties, the five best matches (using our statistical matching method) for each of the demonstration counties. We therefore used a set of 63 unduplicated comparison counties for the 29 counties in Utah.

12. Vermont

Vermont project staff felt that the selection process would work well as long as comparison counties were selected only from Maine, Massachusetts, and the nondemonstration counties in New York. We had already excluded parts of the neighboring state of New Hampshire from consideration, because it was fielding a SPI project of its own.

While project staff had no detailed knowledge of the county-level service systems in Maine,

New York, and Massachusetts, they felt that the counties on our initial list would serve as a reasonable area from which to select comparison beneficiaries. We therefore included, as our final set of comparison counties, the five best matches (using our statistical matching method) for each of the demonstration counties. Because many of the comparison counties were matched to more than one demonstration county, a total of 43 comparison counties were selected for the 14 demonstration counties.

41

II. Selection of Comparison Areas

13. Wisconsin

Wisconsin project staff felt that the selection process would work well as long as comparison counties were selected only from Michigan and Illinois. We had already excluded the neighboring state of Minnesota from consideration, because it had a statewide project of its own. We considered selecting comparison counties from North Dakota and South Dakota, but project staff felt that there were too many service and economic differences between those states and Wisconsin for policymakers to accept them as valid comparison areas.

While project staff had no detailed knowledge of the county-level service systems in

Michigan and Illinois, they felt that the counties on our initial list would serve as a reasonable area from which to select comparison beneficiaries. Therefore, we included, as our final set of comparison counties, the five best matches (using our statistical matching method) for each of the demonstration counties. Because many of the comparison counties were matched to more than one demonstration county, a total of 86 comparison counties were selected for the 73 demonstration counties.

F. CONCLUSIONS AND ROBUSTNESS OF THE SELECTIONS

The counties selected as potential comparison areas have more than enough SSI and SSDI beneficiaries for us to select comparison beneficiaries (Table II.5). Our goal was to have at least five times as many beneficiaries in the comparison areas as a State Project had participants. In fact, the selected comparison areas contain at least 10 times as many beneficiaries as participants, and often more than 100 times as many. The adequacy of the selected pool of potential comparison group members is reinforced by the results from the individual-level matching process described in Chapter III.

To assess the sensitivity of the comparison county selection process to several key features,

we analyzed the following:

��Giving each county characteristic equal weight rather than assigning weights based on the relative size of the regression coefficients. This test helped to assess the sensitivity of the final county selections to the use of specific weights.

��Selecting comparison counties on the basis of only those county characteristics that had statistically significant coefficients in the logit models. This test helped to assess the sensitivity of county selections to the exclusion of characteristics that may not influence SSI beneficiary employment in a specific state even if those characteristics may be influential in another state.

��Including only SSI beneficiaries with mental illness in the logit model for those states that target only beneficiaries with mental illness: Oklahoma, California, and New York. This test helped to assess whether area characteristics play different roles for this specific group of beneficiaries from those they play for the overall set of SSI beneficiaries.

42

II. Selection of Comparison Areas

TABLE II.5

ADEQUACY OF THE BENEFICIARY POPULATION IN THE MATCHED COMPARISON AREAS FOR SSI AND SSDI BENEFICIARIES

State SSI Beneficiaries SSDI Beneficiaries

Projected Project

Enrollment

Beneficiaries in Matched

Comparison Area Projected Project

Enrollment

Beneficiaries in Matched

Comparison Area

California 155 115,556 79 89,365 Illinois 196 18,921 37 55,715 Iowa (SSA) 307 9,801 353 29,494 Minnesota 160 95,745 179 271,979 New Hampshire 54 7,202 104 30,397 New Mexico 464 24,224 359 64,242 New York 942 28,264 0 79,188 North Carolina 159 8,142 214 25,204 Ohio 344 35,978 255 62,969 Oklahoma 31 14,965 20 38,110 Utah 243 42,534 117 124,140 Vermont 378 16,1063 416 301,221 Wisconsin 444 72,674 327 195,906

All 13 States 3,877 640,695 2,460 1,378,051

SOURCE: Projected enrollments are based on State Project enrollment data submitted to the SPI Project Office and have been extrapolated through the end of September 2002 (Table I.3). The number of SSI and SSDI beneficiaries were computed using matching files developed from SSA administrative data (Chapter V).

43

II. Selection of Comparison Areas

In general, we found that most of the comparison counties selected in our initial round (before the state review process) were also selected by the various alternative methods (Table II.6). For example, in California, 80 percent of the counties selected with the basic method were also selected by a method that gave equal weight to each of the area characteristics used to select comparison areas. For all states except Illinois, more than two-thirds of the counties selected by the regression method were also selected by the various alternative methods (the statistics for Illinois exclude comparison areas that were selected for Chicago primarily through discussions with State Project staff). In Illinois, the selection of counties appears to be somewhat more sensitive to the weights, but even then, 50 percent or more of the counties selected by the regression-based method are selected by the alternative methods. Thus, it appears that the specific counties selected by these alternative processes were very similar, although each method selected a slightly different set of counties.

There is similar stability of the final selections made after consultation with State Project

staff. In particular, most of the counties that were ultimately selected after the State Project review would have been on the list of counties generated by giving each characteristic an equal weight in the selection process (Table II.7). For example, 88 percent of the comparison counties that were ultimately selected in our basic approach, including the review by the State Project staff, were on the initial list of counties selected by a process that weighted all 13 characteristics equally. As before, comparison county selection was most sensitive to the specific weights being used in Illinois and New York; in most other states about 90 percent of the counties selected in the basic approach were also on the list generated by the equal-weight approach. Thus, most of the comparison counties that we selected could have been selected by the State Projects if we had used equal weighting in the selection process rather than regression-based weights.

Overall, we feel that the basic regression-based weights provide the best available basis for

selecting comparison counties, despite the slight sensitivity of the county selections to the choice of weights. The regression-based weights reflect the extent to which the various characteristics are related to employment among SSI beneficiaries in the states used in the evaluation. This approach seems to offer a better foundation for comparison site selection than does the simple equal-weight approach. Also, the area characteristics seem to influence employment for SSI beneficiaries with mental illness in the same general way that they influence employment of all SSI beneficiaries.

In any event, it appears that modest differences in county characteristics may not be

particularly important. The marginal effects of county characteristics tend to be fairly small, particularly in comparison to the marginal effects of a person’s prior employment. Thus, two counties may provide similar employment environments for disabled workers even though their characteristics are somewhat different. For example, one of the North Carolina demonstration areas has a poverty rate of 28.3 percent, while the values for the selected comparison counties do not exceed 24.4 percent. However, in practical terms, the difference between 28 percent and 24 percent of a county population living in poverty may not have much of an influence on whether SSI beneficiaries can find and hold jobs.

44

II. Selection of Comparison Areas

TABLE II.6

PERCENT OF COUNTIES SELECTED IN THE REGRESSION-BASED METHOD ALSO SELECTED BY ALTERNATIVE SELECTION METHODS

Alternative Comparison County Methods

State Equal Weights Given to

All County Characteristics

Only Statistically Significant County Characteristics Given

Weighta

County Characteristics Weights Based on Sample of Beneficiaries

with Mental Illnessb

California 80 90 90 Illinois c 50 65 -- Iowa (SSA) 90 -- -- Minnesota 85 75 -- New Hampshire -- -- -- New Mexico 69 84 -- New York c 70 80 90 North Carolina 73 100 -- Ohio 76 83 62 Oklahoma 71 -- 68 Utah 82 92 -- Vermont 81 95 -- Wisconsin 79 89 --

NOTE: New Hampshire is excluded from these sensitivity tests because it had too few counties to make the

regression-based county selection process feasible. aThe regression models for Iowa and Oklahoma did not have any county characteristics that were statistically significant at the 90 percent level using a two-tail test.

bThe only states that exclusively target beneficiaries with mental illness are California, New York, Ohio, and Oklahoma.

cThe regression-based method was used to select comparison counties for all demonstration counties in Illinois and New York other than their respective major cities (Chicago and New York City).

45

II. Selection of Comparison Areas

TABLE II.7

PERCENT OF COUNTIES SELECTED BY STATE PROJECTS ALSO SELECTED BY THE EQUAL-WEIGHT METHOD

State Project In the Top 5 Matches When All County Characteristics Are Given Equal Weight

In the Top 10 Matches When All County Characteristics Are Given Equal Weight

California 63 88

Illinoisa 82 82

Iowa (SSA) 87 96

Minnesota 82 93

New Hampshireb na na

New Mexico 65 82

New York a 0 0

North Carolina 40 80

Ohio 100 100

Oklahoma 61 83

Utah 66 89

Vermont 73 89

Wisconsin 83 95

aThe regression-based method was used to select comparison counties for all demonstration counties in Illinois and New York other than their respective major cities (Chicago and New York City). The sensitivity test applies only to the counties selected using the regression-based method.

bNew Hampshire is excluded from these sensitivity tests because it had too few counties to make the regression-based county selection process feasible.

III

COMPARISON BENEFICIARY SELECTION PROCESS

omparison groups for the 13 State Projects included in the core evaluation are selected from among those SSA beneficiaries who, after the State Project was implemented, both lived in the project’s comparison area and met its targeting criteria. As described in

Chapter II, a State Project’s comparison area includes nondemonstration counties that are similar along many important area characteristics to the counties in which the project was implemented. The targeting criteria of State Projects vary, but they often require that beneficiaries have a particular diagnosis or meet a particular age requirement. Statistical matching using propensity scores is used to select comparison groups.

The comparison group selection process is based on a data file (hereafter, the matching file)

that contains, for participants and potential comparison group members, information from several extracts of SSA’s data system and information from several government agencies, including the U.S. Bureau of the Census, the U.S. Bureau of Labor Statistics, and the U.S. Department of Agriculture. Included in the matching file are several demographics, numerous monthly pre-enrollment outcomes, and many area characteristics. Two of the monthly pre-enrollment outcomes—employment and earnings—are available only for SSI beneficiaries and only during periods when they are on the SSI rolls. They are not available for DI-only beneficiaries, regardless of whether they received benefits. Moreover, this information may be inaccurate, because of the way SSA collects it.1 To address this limitation, the matching file also contains calendar-year employment and earnings from the Summary Earnings Record (SER). This information, though measured on a calendar-year instead of a monthly basis, is more complete and accurate than the monthly information.2

Generally speaking, this matching process selects comparison groups that are similar to

participants along many important characteristics. For SSI beneficiaries, the characteristics include 7 demographics, 24 months of pre-enrollment information for 9 outcomes, 5 calendar-year measures of pre-enrollment employment, 5 calendar-year measures of pre-enrollment earnings, and 13 area characteristics—a total of nearly 250 variables. These are most of the

1We obtained monthly earnings data for SSI beneficiaries from SSA’s REMICS files. These files are created at the end of each month. As such, these files contain estimated earnings amounts. SSI beneficiaries are required to estimate their earnings a few months in advance and then reconcile their reports when pay stubs or other verifying information become available. However, SSA field office staff often ask beneficiaries to overestimate their earnings to reduce the probability of an overpayment. Verified earnings estimates are available in other SSA data extracts, but those extracts contain only countable earnings rather than total earnings. Countable earnings cannot be used to estimate the net effects projects have on earnings because they incorporate the effects of SSI work incentives. Projects are expected to affect the use of those incentives, and as a result countable earnings will differentially capture participants’ actual earnings.

2Agodini et al. (2002) describe how the matching file is created and provide a brief description of the Summary

Earnings Record.

C

48___________________________________________________________________________

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

characteristics that the literature has found are related to the outcomes for which net outcomes will be computed. For DI-only beneficiaries, the number of characteristics is slightly smaller, but the most important variables, including calendar-year earnings and monthly benefit receipt, are still included.

This rest of this chapter describes in detail the way in which comparison groups are selected.

It also presents the results of a preliminary analysis we conducted to assess the success of the matching process using early cohorts of participants in four of the State Projects. These are analyses that must be conducted when the matching process is implemented in the future using the full samples of participants in each of the 13 State Projects included in the core evaluation. Last, we provide tips for implementing the matching process in the future.

A. HOW COMPARISON GROUPS ARE SELECTED

Our goal was to develop a matching process that selects comparison groups that are similar to participants along many characteristics. A straightforward way to do this is to select, for each participant, a comparison group member from a pool of potentials who is identical along each characteristic. The problem with this approach is that the pool of potentials may not contain enough people to produce an exact match for every participant. For example, if 10 dichotomous variables are used to select comparison groups, there are 1,024 possible combinations of values for those variables. The combination of characteristics found for some participants may not be found even among the very large pools of potential comparison group members developed for this evaluation. However, that sample may contain all the people needed to select a comparison group that is similar to participants, on average. The issue then is how to select such a group.

1. Propensity Score Matching

Rosenbaum and Rubin (1983) showed that when many characteristics are used in the matching process, statistical matching using propensity scores could be used to select comparison groups that are similar, on average, to participants along those characteristics. The propensity score is a single number that can be used to determine the extent to which one person is similar to another along observed characteristics. The authors showed that, in situations where the outcome is independent of participant status, given the observed characteristics, the outcome is also independent of participant status, given the propensity score. As a result, matching people using propensity scores produces a comparison group that is similar, on average, to participants along the observed characteristics.

To illustrate how propensity scores can be used to select comparison groups, consider the

following simple example. Suppose that we want to select, from a pool of potentials, a comparison group that is similar to participants by sex. Of course, an easy way to select this group would be to note the sex of each participant and select a potential comparison group member with the same sex. However, suppose that we instead wanted to use propensity scores to do the matching. Also suppose that, among the combined sample of participants and potential comparison group members, 50 percent (or 0.5) of males are participants, whereas 30 percent (or 0.3) of females are participants. In this case, all males in the combined sample would be

___________________________________________________________________________ 49

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

assigned a propensity score of 0.5 and all females a score of 0.3. We would then examine the propensity score of each participant and select a potential comparison group member with the closest absolute propensity score. As such, participants with a score of 0.5 (males) would be paired with potential comparison group members with a score of 0.5 (males), and vice versa for individuals with a score of 0.3 (females).

Our matching process uses propensity scores to select comparison groups. However, when

computing propensity scores, our process uses many more characteristics than in the simple example above. In particular, it includes these three steps:

1. Estimate a probability model of participant status. A logit model is estimated,

where a binary dependent variable that equals one for participants and zero for potential comparison group members is regressed on independent variables that represent individual characteristics. (The characteristics included in the logit model are described later in this chapter.) We estimate the model using participants and all potential comparison group members.

2. Assign a propensity score to each individual. The propensity score is a single

number that equals the weighted sum of an individual’s values for the characteristics included in the logit model, where the weights are the parameter estimates of the logit model.

3. Select comparison group members using propensity scores. For each participant,

the potential comparison group member with the closest absolute propensity score, or the “nearest neighbor,” is selected. The selection process is done with replacement, so that a potential comparison group member can be matched to several participants.

It is worth noting that propensity scores generally have no behavioral significance. Consider

our simple example of how propensity scores could be used to select comparison groups. In that example, 50 percent of males and 30 percent of females are participants—a difference of 20 percentage points. This indicates that males are more likely by 20 percentage points to be a participant than females. However, this result is based on the combined sample of participants and potential comparison group members from the comparison area—individuals who were never actually eligible to participate in a State Project. A more accurate understanding of the factors that influence participation is achieved by examining participants and eligible non-participants—individuals who were eligible to participate in a State Project but did not. To illustrate this point, suppose that, among the combined sample of participants and eligible non-participants, 50 percent of males and 40 percent of females are participants—a difference of 10 percentage points. This indicates that, among individuals who were eligible to participate in a State Project, males are more likely by 10 percentage points to be a participant than females, not by 20 percentage points as the results based on potential comparison group members from the comparison area wrongly suggest.

50___________________________________________________________________________

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

2. Potential Comparison Group Members

Comparison groups are selected from among those SSA beneficiaries who, after a particular State Project was implemented, both lived in the State Project’s comparison area and met (what we refer to as) the project’s primary targeting criteria. These criteria vary across the State Projects, but they often require that beneficiaries have a particular diagnosis or meet a particular age requirement. The sample of potential comparison group members is not limited to those individuals who further met (what we refer to as) the project’s secondary criteria, because the SSA data rarely contain information about these criteria. The secondary criteria are more subjective than the primary ones, and include items such as whether it has been determined that a beneficiary needs project services in order to increase earnings substantially. For most State Projects, there are thousands, sometimes tens of thousands, of potential comparison group members (Table II.5).

For each State Project, comparison groups are selected separately for up to four groups of

participants, as defined according to SSI/SSDI receipt and the population density of the county in which they live. Specifically, participants are first divided into those who received only SSI benefits or both SSI and SSDI benefits at intake (hereafter, SSI participants) and those who received only SSDI benefits at intake (hereafter, DI-only participants). Participants are divided in this way to make use of the monthly pre-enrollment employment and earnings information that is available for SSI beneficiaries, but not for DI-only beneficiaries. Also, the State Projects are likely to affect employment behavior of the two groups differently, because the SSI and SSDI programs provide different work incentives. These two groups of participants are further divided into those who live in populous counties and those who do not. (A county is defined as populous if its population density exceeds 90 people per square mile; this generally includes counties with 50,000 or more residents.) This second division helps us select a comparison group that is similar to participants along the area characteristics used in the area matching process, particularly the general level of economic activity and the availability of employment support services.

Ideally, we would also like to divide each of these four participant groups into cohorts based

on enrollment date. For example, we would like to select comparison groups separately for participants who enrolled in a particular State Project during the first month after it was implemented, during the second month, and so on. Selecting comparison groups for cohorts of participants would ensure that participants and their respective comparison groups met the project’s targeting criteria during the same time period.

The problem with this approach is that there are too few participants in each cohort for us to

judge effectively whether our matching process selects well-matched comparison groups.3 This is also true if fewer cohorts were created, such as quarterly ones. For example, an average of about 60 participants enrolled in the Wisconsin project during each quarter after it was

3Later in this chapter, we explain why it is important to examine whether the matching process actually produced comparison groups that are similar to participants.

___________________________________________________________________________ 51

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

implemented. Assuming that about half of these were SSI participants and the other half DI-only participants, and that half of each of these groups lives in a populous county and the other half in a non-populous county, we would need to select a comparison group for an average of about 15 participants in each cohort. Assuming that 15 comparison group members were selected for each quarterly cohort of participants, statistical tests of the similarity of the characteristics across participants and their respective comparison groups would be based on 30 observations. Samples of this size are unlikely to be sufficient to detect important differences in the characteristics of the two groups. As a result, our matching process does not select comparison groups on a cohort basis.

3. When Characteristics of Potential Comparison Groups Are Measured

When deciding whether a potential comparison group member is a good match for a participant, an important issue is the point in time at which we measure the potential comparison group member’s characteristics. Individuals tend to enter programs, like a State Project, when they are interested in finding a job or increasing their earnings—not at a random point in their lives (Ashenfelter 1978). Thus, we need to find someone who is at the same point in the employment decision process but did not have the opportunity to enroll in a State Project.

For example, suppose that we are looking for a comparison group member for a male

participant who worked during the month before enrollment, but not during the month of enrollment. This man may have enrolled in a State Project because he just lost a job. Also, suppose that the sample of potential comparison group members contains a male who lived in the State Project’s comparison area and met the project’s targeting criteria during three consecutive months. Last, suppose that this potential comparison group member worked during the first month but not during the second and third months. This potential comparison group member would be a good match for the participant described in the example above if we measured his characteristics during the second month, but not if we measured them during the third.

Our matching process addresses this issue by first determining the months during which

each potential comparison group member lived in the State Project’s comparison area and met its targeting criteria. We then pick one of those months at random (hereafter, the pseudo-enrollment date) and measure the characteristics of the comparison group member during it. A potential problem with this approach is that, because our matching process focuses on each comparison group member’s characteristics at a specific point in time, it may fail to identify another point when that beneficiary might be a good (or even better) comparison group member for a specific participant. However, as described later in this chapter, our assessment of the matching process shows that this is not an issue, because we easily found a comparison group that is similar to participants.

4. Tests Used to Assess the Similarity of Participants and the Comparison Groups

In addition to showing the usefulness of propensity scores for selecting comparison groups, Rosenbaum and Rubin (1983) also showed that a comparison group selected using propensity scores could produce unbiased impact estimates if two conditions are satisfied: (1) all the

52___________________________________________________________________________

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

characteristics that are related to participant status and also to outcomes are observed, and (2) participants and comparison group members with similar propensity scores are similar along these characteristics. The second condition means that the logit model must produce an estimate of the propensity score such that, at each value of the estimated propensity score, the characteristics of participants and comparison group members are similar.

It is difficult to determine whether our comparison groups satisfy the first condition. As

described later in this chapter, our matching process selects comparison groups that are similar to participants along most of the characteristics that the literature has found are related to the outcomes for which net outcomes will be computed. However, there are certain characteristics that the literature indicates are related to the outcomes of interest that have not been included in the matching process because they are not available in the SSA data. For example, the SSA data do not contain information about the occupation and industry in which a beneficiary works. Whether it is important to include any of these additional characteristics in the matching process depends on the extent to which they influence outcomes among individuals who have been matched along all the characteristics included in the matching process. This is an open issue. Fortunately, there are analyses we can conduct to help us understand whether our design produces valid net outcomes—the ultimate goal of the design. These validity analyses, which are conducted as part of the design, are described in Chapter IV. However, before conducting these analyses, it is important to first determine whether our comparison groups satisfy the second condition.

To determine whether our matching process satisfies the second condition, we must compare

the characteristics of participants and comparison group members with similar propensity scores, as the second condition indicates. Specifically, we must first assign participants and comparison group members to strata, where each stratum includes participants and comparison group members whose propensity score is not significantly different.4 We then conduct, within each stratum, two-tailed t-tests of the similarity of each characteristic across participants and comparison group members. We consider a comparison group to be well matched to its respective group of participants if 95 percent of these statistical tests failed to detect a difference (at the 0.05 level). For ease of exposition, we refer to this as “the 95 percent test.” If a comparison group does not pass the 95 percent test, the logit model must be respecified and the comparison group reselected until it does pass.

4The strata are defined in a way often done in other studies (see, for example, Dehejia and Wahba 1998). In particular, the collection of participants and comparison group members is first ranked according to their propensity scores. Individuals are then divided into strata with an equal number of individuals in each. Each stratum should contain enough individuals to ensure that statistical tests conducted within it have enough power to detect any meaningful differences in the characteristics of participants and comparison group members. A stratum that contains about 80 individuals (where about 40 are participants and the other 40 are comparison group members) should be sufficient. Within each stratum, a statistical test of the similarity of the propensity score of participants and comparison group members is then conducted. If each of these tests fails to reject a difference, then the strata have been properly defined.

___________________________________________________________________________ 53

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

In addition to passing the 95 percent test, the group should show no patterns among any of the (less than 5 percent of) characteristics that are significantly different. For example, we would prefer that any statistically significant differences be scattered among the characteristics used in the matching process, instead of clustered among, say, the monthly measures of pre-enrollment earnings. However, just as if beneficiaries were randomly assigned to the participant or comparison group, any by-chance differences might be clustered among a particular characteristic. If this were the case for beneficiaries that were randomly assigned to the participant or comparison group, it is somewhat common practice to use regression analysis to adjust for these differences when computing net outcomes. Therefore, we use regression analysis to compute net outcomes (see Chapter IV).

5. Characteristics Used in the Matching Process

Our goal was to select comparison groups that are similar, on average, to their respective group of participants along all the characteristics that affect the outcomes for which net outcomes will be computed. This would ensure that the comparison groups experience the outcomes that participants would have experienced had they not been exposed to program services. In other words, it would ensure that any differences in outcomes between participants and the comparison groups reflect the effect of project services and not underlying differences between the two groups that affect the outcomes of interest.5

Our approach for meeting this goal was to select comparison groups that are similar to their

respective groups of participants along all the characteristics that the literature indicates are related to the outcomes of interest and that are available in the SSA data.6 The characteristics include the following 7 demographics:

1. Age

2. Sex

3. Race/ethnicity

4. Education

5In theory, a well-implemented matching process only ensures that participants and the comparison group are similar only along the characteristics included in the matching process. Therefore, there is no guarantee that the two groups are similar along characteristics that were excluded from the matching process, such as combinations of characteristics. However, research has found that including combinations of characteristics in the matching process is generally not necessary because a well-implemented matching process produces a comparison group that is similar to participants along these characteristics (Dehejia and Wahba 1998). Nevertheless, as described in the last section in this chapter, the need to include such characteristics in the matching process should be explored when implementing the design in the future.

6The outcomes of interest include employment, earnings, SSI/SSDI receipt and benefits, and total income.

54___________________________________________________________________________

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

5. Living arrangements

6. Disabling condition

7. Enrollment/pseudo-enrollment date They also include the pre-enrollment history of the following 9 outcomes:

1. SSI program participation

2. SSI benefits received

3. SSDI program participation

4. SSDI benefits received

5. Use of SSA work incentives

6. Medicaid coverage7

7. Employment

8. Earnings

9. Total income Last, they include the following 13 characteristics that describe a person’s county of residence (see Chapter II):

1. Population density

2. Population growth

3. Unemployment rate

4. Unemployment volatility

5. Total county employment

6. Employment growth

7A variable indicating Medicare coverage is not included because it is implicitly controlled for by the variable that measures time on SSDI. SSDI beneficiaries are covered by Medicare 24 months after they start to receive benefits. Thus, those on the rolls for less time than that will not have Medicare coverage (with the exception of beneficiaries with end-stage renal disease, for whom there is no waiting period).

___________________________________________________________________________ 55

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

7. Percent of county devoted to farmland

8. Presence of substantial manufacturing in the county

9. Rate of use of public transportation

10. Poverty rate

11. Percent of county that is Hispanic

12. Percent of county that is non-white

13. Pre-demonstration employment rate of SSI beneficiaries Characteristics that the literature finds are related to the outcomes of interest but that are not

available in the SSA data include household composition, occupation, industry, and the presence of functional limitations. Another characteristic that is not available in the SSA data, but that may be important to include in the matching process, is the extent to which individuals are motivated to work and therefore interested in receiving project services. Whether it is important to include any of these additional characteristics in the matching process depends on the extent to which they influence outcomes among individuals who have been matched along all the characteristics above. As described in Chapter IV, we conduct several tests to determine whether our comparison groups differ from participants along important characteristics. Preliminary results from those tests suggest that they do not.

Unfortunately, not all the available characteristics can be included in the logit model,

because the number of characteristics exceeds the number of beneficiaries in each participant group. Suppose that each characteristic was included in the logit model, using only one variable. The demographics would be included using 7 variables. We have 24 months of pre-enrollment information for each of the 9 outcomes, or a total of 216 monthly variables. We also have 10 variables based on the calendar-year earnings data. Last, there are 13 area characteristics. Taken together, a total of nearly 250 variables would have to be included in the logit model. Unfortunately, not all these variables can be included, because there are not enough beneficiaries in each participant group to estimate a logit model with all the variables. Even if there were enough beneficiaries in a participant group to estimate such a logit model, not all the variables could be included, because many of them are correlated. Therefore, the parameter estimates of the logit model would suffer from collinearity problems.

To address this issue, the matching process must use the subset of characteristics that

produces a comparison group that is similar to participants along all the available characteristics. The subset of characteristics initially used in the matching process should include those that statistical tests indicate are different across participants and potential comparison group members. A comparison group is then selected and the 95 percent test conducted—that is, within each propensity score stratum, a statistical test of the similarity of each available characteristic across participants and the selected comparison group is conducted. Considering that beneficiaries are often divided into 3 strata and that the number of available characteristics equals nearly 250, determining whether the 95 percent test is passed means examining the results of about 750 statistical tests for each of the comparison groups. If 95 percent of these tests fail to

56___________________________________________________________________________

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

detect a difference, we consider the comparison group to be well matched. Otherwise, the logit model must be respecified and the comparison group reselected until the 95 percent test is passed. This process ensures that participants and their selected comparison group are similar along all the available characteristics. Figure III.1 summarizes the process used to select comparison groups.

B. PRELIMINARY ASSESSMENT OF THE MATCHING PROCESS

To assess the success of our matching process, we selected comparison groups for early cohorts of participants in the following four State Projects: (1) Iowa, (2) New York, (3) Vermont, and (4) Wisconsin. We chose these projects to understand how successfully the matching process produces well-matched comparison groups for State Projects that operate within a state (Iowa and New York) and for State Projects that operate throughout a state (Vermont and Wisconsin). Our analysis was limited to participants who enrolled by December 2001, because the available data permit examining only those beneficiaries who enrolled by this point.8 We selected comparison groups using a SAS macro called neighbor.sas (Agodini and Dynarski 2001). The way neighbor.sas it is used to select comparison groups is described by Khan et al. (2002).

The way in which participants are divided (by SSI/DI-only status and populous/non-

populous county) means that we selected a comparison group for 10 groups of participants. Four of the 10 were for the Iowa project: SSI participants in populous counties, SSI participants in non-populous counties, DI-only participants in populous counties, and DI-only participants in non-populous counties. The next 2 of the 10 comparison groups were for the New York project, which enrolled only SSI participants in two sites, New York City and Buffalo. We conducted the matching separately for each of the sites. Another 2 of the 10 comparison groups were for the Vermont project: SSI participants and DI-only participants. We did not divide Vermont participants further into those who live in a populous or non-populous county, because the number that lives in a populous county (40 participants) was too small to support matching. The last 2 of the 10 comparison groups was for the Wisconsin project: SSI participants and DI-only participants. As was the case for Vermont participants, we also did not divide Wisconsin participants further into those who live in a populous or non-populous county, because the number that lives in a non-populous county (49 participants) was too small to support matching.

As mentioned above, comparison groups were selected from among those SSA beneficiaries

who, after a particular State Project was implemented, both lived in the State Project’s comparison area and met its primary targeting criteria. The comparison area for the Iowa project

8Our analysis was also limited to individuals who received SSA benefits at some point since January 1999, because we have data for only these individuals. This has almost no effect on the number of participants included in our analysis, because the State Projects we considered enrolled primarily SSA beneficiaries. This is also true for all the other State Projects that are part of the evaluation, except one—the Utah project. The design can include only a subset of Utah project participants, because the Utah project did not limit enrollment to SSA beneficiaries.

FIGURE III.1

PROCESS USED TO SELECT COMPARISON GROUPS

Step

1

2

3

4

5

6

7

Estimate a logit model of participant status on a subset of X using P

and Cp

Select a comparison group from Cp based on p-hat of P

Within each p-hat stratum, compare each X of P and Cs

Dothey

differ?

Cs is similar to P along all X

Respecify the logit model

Yes

No

P = participants

Cp = potentialcomparison group members

p-hat = estimatedpropensity score

Cs = selected comparison group members

Identify X

X = characteristics inthe matching file thatthe literature indicatesare related to theoutcomes of interest

Assign p-hat to P and Cp

57

_______________________________________________________________________________________III. Comparison Beneficiary Selection Process

58___________________________________________________________________________

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

includes nondemonstration Iowa counties that are similar to the Iowa demonstration counties. The comparison area for the New York project includes Westchester and Nassau counties for the New York City site and Oneida County for the Buffalo site. The comparison area for the Vermont project includes counties in Maine, Massachusetts, and New York (excluding the New York City and Buffalo demo sites) that are similar to the Vermont demonstration counties. The comparison area for the Wisconsin project includes counties in Illinois9 and Michigan that are similar to the Wisconsin demonstration counties.

Table III.1 lists the characteristics that were initially included in the logit model, separately

for SSI and DI-only beneficiaries. This is the collection of characteristics that statistical tests indicated are different across the 10 groups of participants and their respective pools of potential comparison group members. For SSI beneficiaries, the characteristics included: (1) all the demographics, (2) most outcomes during several periods before enrollment, (3) employment and earnings during several calendar years before enrollment, and (4) several measures used to characterize a beneficiary’s entire benefit history. For DI-only beneficiaries, the same set of characteristics was initially included, except for sex and race/ethnicity, as these participants and potential comparison group members did not differ along these characteristics. The monthly measures of employment, earnings, and receipt of unearned income also were excluded, because this information is not available for DI-only beneficiaries.

Models using the initial list of characteristics represented a starting point in the matching

process. Matching is an iterative process that is entirely driven by the distribution of characteristics among the participants and potential comparison group members. Thus, it was not surprising that most of the comparison groups we selected using the initial set of characteristics did not pass the 95 percent test. In particular, some of the initial comparison groups for the SSI participant groups differed along some of the monthly pre-enrollment measures that were included in the logit model, such as earnings during the three months before enrollment. Some of these initial comparison groups also differed along monthly pre-enrollment measures that were not included in the logit model, such as benefit receipt during the 13th through 18th months before enrollment. Initial comparison groups for the DI-only participant groups differed in similar ways.

The next step in the process was to respecify the logit model and reselect comparison groups

until each one passed the 95 percent test. For characteristics that were already included in the model but nevertheless differed across participants and comparison group members, respecifying the logit meant adding higher-order or interaction terms for those characteristics. In some cases, this meant adding to the logit model terms that have no clear behavioral interpretation, such as the cube root of the county SSI employment rate. For characteristics that were not initially included in the model and differed across participants and comparison group members, respecifying the logit meant adding some of those characteristics. For example, we sometimes had to add benefit receipt during the 18th month before enrollment—a characteristic that was not

9While Illinois is operating a State Project, that project serves only youth who are still in school. Thus, Illinois could be used as a source of comparison beneficiaries for the adults in Wisconsin.

___________________________________________________________________________ 59

____________________________________________________________________________________________ III. Comparison Beneficiary Selection Process

TABLE III.1

CHARACTERISTICS INITIALLY USED IN THE BENEFICIARY MATCHING PROCESS

Participant Group

Characteristic SSI DI-Only

Age (in Years) X X Log Age Squared X X Sex X Race/Ethnicity (White) X Years of Education X X Years of Education Missing X X Living Alone at Enrollment X X Lived with Another SSI Recipient X Lived in Medicaid-Funded Facility in the 2 Years Before Enrollment X Diagnosis of Mental Illness X X Diagnosis of Mental Retardation X X Diagnosis Missing X X Eligible for Medicaid X On SSI Before Age 18 X Months on SSI (Current Spell) X Active SSI Status 1 Month Before Enrollment X Active SSI Status 2 Months Before Enrollment X Active SSI Status 3 Months Before Enrollment X Received Cash Benefit 1 Month Before Enrollment X Received Cash Benefit 2 Months Before Enrollment X Received Cash Benefit 3 Months Before Enrollment X Log of Total SSI Benefits Paid 1 Month Before Enrollment X Log of Total SSI Benefits Paid 2 Months Before Enrollment X Log of Total SSI Benefits Paid 3 Months Before Enrollment X Active SSDI Status 1 Month Before Enrollment X X Active SSDI Status 2 Months Before Enrollment X X Active SSDI Status 3 Months Before Enrollment X X Ever Used SSI Work Incentives During Year Before Enrollment X Employed in Month Prior to Enrollment X Employed in 2nd Month Prior to Employment X Employed in 3rd Month Prior to Employment X Log of Earnings in Month Prior to Enrollment X Log of Earnings in 2nd Month Prior to Enrollment X Log of Earnings in 3rd Month Prior to Enrollment X Square of Log of Earnings in Month Before Enrollment X Square of Log of Earnings in 2nd Month Before Enrollment X Square of Log of Earnings in 3rd Month Before Enrollment X Earnings Missing in Month Prior to Enrollment X Earnings Missing in 2nd Month Prior to Enrollment X Earnings Missing in 3rd Month Prior to Enrollment X Received Unearned Income During Any of the 3 Months Before Enrollment X Employed During 1999 X X Employed During 1998 X X Employed During 1997 X X Employed During 1996 X X Annual Earnings During 1999 X X Annual Earnings During 1998 X X Annual Earnings During 1997 X X Annual Earnings During 1996 X X Enrolled in 1st Quarter of State Project Operations X X Enrolled in 2nd Quarter of State Project Operations X X Enrolled in 3rd Quarter of State Project Operations X X

60___________________________________________________________________________

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

initially included in the logit model. We then reselected comparison groups until each one passed the 95 percent test.

In some cases, respecifying the logit model resulted in a comparison group that contained

fewer unique individuals than earlier specifications, whereas, in other cases, it resulted in more unique individuals. The final number of unique comparison group members that is selected depends on two factors. The first is the extent to which the available characteristics distinguish participants and the pool of potential comparison group members. The second is the extent to which the pool of potential comparison group members contains enough individuals that are similar to participants along the available characteristics.

For each of the 10 participant groups, we ultimately selected a comparison group that either

passed the 95 percent test or did not but was close (Table III.2). The latter includes the comparison group for the Vermont SSI participant group and both comparison groups (SSI and DI) for the Wisconsin participant groups. These comparison groups pass a 92 percent test. Although passing a 92 percent test seems reasonable, we nevertheless tried several times to select comparison groups for these participant groups that pass the 95 percent test. Unfortunately, each time we tried, the resulting comparison groups were further from passing the 95 percent test. In cases such as this, where the comparison group does not pass the 95 percent test but is close, we use regression analysis to adjust for the characteristics that are significantly different when computing net outcomes—see Chapter IV. Even in cases where the comparison group passes the 95 percent test, regression analysis is used to compute net outcomes.

For example, consider the results for the Iowa SSI populous matching. Table III.3 reports

the size of this participant group and the size of two other groups by the estimated propensity score. The first group is the pool of potential comparison group members; the second is the comparison group selected from the pool of potentials. The number of selected comparison group members is unweighted—that is, it does not take into consideration the number of times a comparison group member was matched to a participants. When weights are used, the number of comparison group members equals the number of participants.

Except for the few participants with the highest propensity scores, these results indicate that

there is considerable overlap in the propensity scores of participants, potential comparison group members, and the selected comparison group. For example, among the 29 participants with a propensity score between 0.0 and 0.1, there are 869 potential comparison group members with similar propensity scores. From these potential comparison group members, the matching process selected 27 unique comparison group members. Only the three participants with propensity scores between 0.8 and 0.9 do not have comparison group members with a propensity score in the same range. These participants were matched to either the comparison group members with a propensity score between 0.7 and 0.8, or the comparison group member with a propensity score between 0.9 and 1.0, depending on which one of these was the nearest neighbor.

Table III.4 provides additional evidence that participants and the comparison group are well

matched. This table reports the average values of the final set of characteristics used in the matching process for participants, the entire pool of potential comparison group members, and the selected comparison group. These results indicate that participants are significantly different from the pool of potential comparison group members. In particular, among the 47 variables

III. Comparison Beneficiary Selection Process

TA

BL

E I

II.2

SIM

ILA

RIT

Y O

F P

AR

TIC

IPA

NT

AN

D C

OM

PA

RIS

ON

GR

OU

PS

Pa

rtic

ipan

t Gro

up

C

ompa

riso

n G

roup

Stat

e P

roje

ct

Typ

e of

B

enef

icia

ry

Typ

e of

D

emon

stra

tion

Are

a Sa

mpl

eSiz

e

Com

pari

son

Are

a Sa

mpl

e Si

zea

Perc

ent o

f Si

mil

ar

Cha

ract

eris

tics

Iow

a SS

I Po

pulo

us

143

Po

pulo

us n

on-d

emo

Iow

a co

untie

s 10

3 96

.1

SS

I N

on-p

opul

ous

138

N

on-p

opul

ous

non-

dem

o Io

wa

coun

ties

122

98.4

DI-

Onl

y Po

pulo

us

68

Po

pulo

us n

on-d

emo

Iow

a co

untie

s 60

98

.6

D

I-O

nly

Non

-pop

ulou

s 96

Non

-pop

ulou

s no

n-de

mo

Iow

a co

untie

s 94

96

.1

Ver

mon

t SS

I A

ll

320

C

ount

ies

in M

aine

, Mas

sach

uset

ts, a

nd N

ew Y

ork

30

7

94.3

DI-

Onl

y A

ll 19

0

Cou

ntie

s in

Mai

ne, M

assa

chus

etts

, and

New

Yor

k 18

9 98

.4

New

Yor

k SS

I N

ew Y

ork

Cit

y

181

W

estc

hest

er a

nd N

assa

u co

unti

es

16

6

95.7

SSI

Buf

falo

13

4

One

ida

Cou

nty

96

97.6

W

isco

nsin

SS

I A

ll

341

C

ount

ies

in I

llino

is, M

ichi

gan,

Nor

th D

akot

a, a

nd S

outh

Dak

ota

32

4

93.2

DI-

Onl

y A

ll 20

4

Cou

ntie

s in

Illi

nois

, Mic

higa

n, N

orth

Dak

ota,

and

Sou

th D

akot

a 20

2 92

.3

SO

UR

CE

: A

utho

rs’

calc

ulat

ions

bas

ed o

n SS

A a

dmin

istr

ativ

e da

ta.

a T

he n

umbe

r of

com

pari

son

grou

p m

embe

rs i

s un

wei

ghte

d—th

at i

s, it

doe

s no

t ta

king

into

con

side

rati

on th

e nu

mbe

r of

tim

es a

com

pari

son

grou

p m

embe

r w

as m

atch

ed t

o a

part

icip

ant.

Whe

n w

eigh

ts

are

used

, the

num

ber

of c

ompa

riso

n gr

oup

mem

bers

equ

als

the

num

ber

of p

artic

ipan

ts.

61

62___________________________________________________________________________

III. Comparison Beneficiary Selection Process

TABLE III.3

IOWA PROJECT NUMBER OF SSI PARTICIPANTS AND COMPARISON GROUP MEMBERS

IN POPULOUS AREAS BY THE ESTIMATED PROPENSITY SCORE Comparison Group Estimated Propensity Score Participants Potential Selecteda 0.0–0.1 29 869 27 0.1–0.2 24 168 24 0.2–0.3 21 67 17 0.3–0.4 22 33 15 0.4–0.5 13 14 8 0.5–0.6 5 10 4 0.6–0.7 15 6 6 0.7–0.8 9 1 1 0.8–0.9 3 0 0 0.9–1.0 2 1 1 Sample Size 143 1,169 103 SOURCE: Authors’ calculations based on the matching file. aThe number of selected comparison group members is unweighted—that is, it does not take into consideration the number of times a selected comparison group member was matched to a participant. When weights are used, the number of selected comparison group members equals the number of participants.

63

III. Comparison Beneficiary Selection Process

TABLE III.4

IOWA PROJECT SSI PARTICIPANTS AND COMPARISON GROUP MEMBERS IN POPULOUS AREAS

CHARACTERISTICS USED IN THE MATCHING PROCESS

Comparison Group Participants Potential Selected Diagnosis

Mental disorder 58.7 35.2* 60.1 Mental retardation 20.3 27.6* 20.3 Missing 5.6 8.4 3.5

Age

37.6

39.1

36.3

Male

44.1

46.8

58.0*

White

92.3

87.3*

93.7

Education

Years of schooling 11.8 10.6* 11.9 Missing 21.7 31.7* 19.6

Lived Alone Pre-Month 1

80.4

84.0

79.7

Lived with SSI Recipient Pre-Month 2

2.1

4.1

3.5

Eligible for Medicaid Pre-Month 1

74.8

72.7

80.4

On SSI Before 18 Years Old

11.5

19.9*

10.2

Months on SSI During Recent Spell

15.4

18.0*

16.3

Number of Months Active on SSI Program Between Pre-Month 1 and Pre-Month 3

2.5

2.8*

2.6 Received SSI Benefit

Pre-month 1 59.4 69.4* 59.4 Pre-month 2 59.4 67.3* 59.4 Pre-month 3 56.6 64.8* 60.1

Log Total SSI Benefit Paid

Pre-month 1 3.1 3.8* 3.3 Pre-month 2 3.1 3.7* 3.2 Pre-month 3 2.9 3.6* 3.1

Active on SSDI Program

Pre-month 1 74.1 47.6* 74.1 Pre-month 2 72.0 46.8* 74.8 Pre-month 3 69.9 45.9* 73.4

Ever Used Work Incentive Pre-Year 1

48.3

25.6*

51.7

Employed in Pre-Month 1

49.0

22.4*

47.2

Log Earnings

Pre-month 1 2.9 1.2* 2.8 Pre-month 2 2.7 1.2* 2.8 Pre-month 3 2.5 1.2* 2.8

Log Earnings Missing

Pre-month 1 15.4 6.8* 9.1 Pre-month 2 16.1 11.6 10.5 Pre-month 3 21.7 15.1* 15.4

Received Unearned Income Pre-Month 1, 2, and 3

44.8

43.9

46.9

Log Age Squared

7.2

7.2

7.1

64___________________________________________________________________________

TABLE III.4 (continued)

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

Comparison Group Participants Potential Selected Log Earnings Squared

Pre-month 1 17.4 6.5* 17.3 Pre-month 2 16.4 6.7* 17.3 Pre-month 3 15.1 6.8* 17.1

Annual Employment

1999 66.4 44.2* 67.8 1998 65.0 45.3* 59.4 1997 67.8 43.5* 65.0 1996 67.8 45.2* 62.9

Log of Annual-Earnings

1999 5.3 3.4* 5.4 1998 5.2 3.5* 4.7 1997 5.3 3.3* 5.0 1996 5.2 3.4* 4.7

Enrolled

Within 1st quarter 11.2 22.4* 9.1 Between 1st and 2nd quarters 25.9 18.6* 21.7 Between 2nd and 3rd quarters 28.7 22.7 28.7

Unweighted Sample Sizea 143 1,169 103 SOURCE: Authors’ calculations based on the matching file. aStatistics were computed using weights, where each participants received a weight of 1, each potential group member received a weight of 1, and each selected comparison group member received a weight equal to the number of times he or she was matched to a participant.

*Significantly different from participants at the .05 level, two-tailed test.

___________________________________________________________________________ 65

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

used in the matching process, only 10 are similar across participants and potential comparison group members. In contrast, participants and the selected comparison group are similar along nearly all the characteristics used in the matching process. The only characteristic along which participants and the selected comparison group differ is sex.

The last and perhaps strongest piece of evidence that participants and the selected

comparison group are well matched is the results of the nearly 750 within-stratum statistical tests. We divided participants and comparison group members into three strata based on their propensity scores, where two-tailed t-tests of the average propensity score of participants and comparison group members within each of the three strata did not detect a significant difference between the two groups (at the 0.05 level). Within each stratum, we then conducted t-tests of the similarity of each of the nearly 250 variables between participants and the comparison group—a total of nearly 750 statistical tests. Only 3.9 percent of these tests detected a difference in the characteristics between participants and the comparison group. Moreover, there do not appear to be any patterns among the dissimilar characteristics. For example, among individuals in the lowest of the three strata, the only characteristics that differ between the two groups are: (1) the proportion that lived alone during the first month before enrollment, and (2) earnings during pre-enrollment months 23 and 24. Although there do not appear to be any patterns among these dissimilar characteristics, regression analysis is used to adjust for these differences when computing net outcomes (see Chapter IV).

C. TIPS FOR IMPLEMENTING THE MATCHING PROCESS IN THE FUTURE

Results of the matching process assessment described in this chapter, along with results from an initial assessment we conducted several months ago (Agodini et al. 2002), suggest several tips for implementing the matching process in the future. First, they indicate that future implementations must be performed on SSA’s computer. Creating the files used by both the beneficiary matching process and the computer programs used to produce net outcomes requires extracting and processing a huge amount of data from SSA’s administrative system. For example, in order to construct the file used to conduct the preliminary assessment of the matching process described earlier, we worked with SSA to extract from their administrative system about 85 data files that comprise 8 distinct extracts, each of which contains at least a million records, if not several million. We then processed those data files on SSA’s mainframe computer to create a file that can be used to select comparison groups for each of the 13 State Projects included in the core evaluation. This involved running many computer programs, some of which took days to complete. It would be extremely slow and cumbersome to process these data on currently available personal computers. Moreover, calendar-year employment and earnings information is needed to select well-matched comparison groups, and, because of confidentiality rules, only SSA staff can use those data.

Second, the results indicate that significant researcher and SSA staff time is needed to

implement the matching process in the future. Selecting a well-matched comparison group for participants of each State Project involves an iterative process with hundreds of statistical tests in each round. Furthermore, the nature of statistical matching means that evaluators cannot use the empirical literature or theories to determine how to respecify models that do not produce well-matched comparison groups. The resulting trial-and-error process is unpredictable with respect

66___________________________________________________________________________

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

to how long it will take to produce good matches. Nevertheless, our experience suggests that with sufficient time, well-matched comparison groups can be selected. Moreover, since only SSA staff can access the calendar-year employment and earnings data that are needed to select well-matching comparison groups, significant SSA staff time is also needed.

Third, the results indicate that it is important not to use previous implementations of the

matching process as a starting point for future implementations. When selecting the comparison groups for the State Project participants examined in this chapter, we initially included in the logit model the subset of characteristics that helped us find, as part of an intermediate assessment of the matching process, a well-matched comparison group for an earlier cohort of Wisconsin participants (Agodini et al. 2002). In many cases, that subset of characteristics was not appropriate for the State Project participants we examined in this chapter. In fact, in these cases, we modified the matching process several times, only to learn that our modifications did not help select a well-matched comparison group. Instead, only after we started over with an initial model that included the subset of characteristics that differed significantly among participants and potential comparison group members did our iterative matching process move to identify a well-matched comparison group.

Fourth, in order to increase the power of the design, the possibility of selecting larger

comparison groups that are well matched to each group of participants should be explored when the matching process is implemented in the future. Our matching process selects, for each participant, only one comparison group member—the nearest neighbor—from the pool of potentials. The preliminary assessment of our matching process shows that, on average, the number of unique comparison group members selected by this process equals about 90 percent of the participant group sample size (Table III.2). As described in Chapter 5, samples of this size will be sufficient to detect policy-relevant impacts for many of the State Projects. However, as we mentioned, for most State Projects, there are thousands, sometimes tens of thousands, of potential comparison group members. Therefore, it is possible that some of the potential comparison group members that were not selected using the nearest-neighbor matching algorithm are also similar to participants, on average. If this were the case, it would be useful to select these additional comparison group members when implementing the matching process in the future, as this would increase the power of the design. We have developed a SAS macro called caliper.sas that can be used to select more potential comparison group members than just the nearest neighbors. This macro selects, for each participant, all the potential comparison group members who have a propensity score that falls within a specified range (or, what is often referred to as, a caliper) of the participant’s propensity score. The way caliper.sas is used to select comparison groups is described by Khan et al. (2002).

Fifth, the need to include important interaction terms in the matching process should be

explored. When conducting the preliminary assessment of the matching process, we examined the extent to which participants and the comparison group are similar along each available characteristic. However, we did not examine the extent to which the two groups are similar along combinations of the available characteristics. For example, we examined whether the sex of participants and the comparison group is similar, as well as the race/ethnicity. However, we did not examine whether the two groups are similar along combinations of sex and race/ethnicity, such as a similar proportion of white males. Dehejia and Wahba (1998) found that including interaction terms in the matching process is generally not necessary because our

___________________________________________________________________________ 67

_____________________________________________________________________________ III. Comparison Beneficiary Selection Process

matching process, which is similar to the matching process they used, produces a comparison group that is similar to participants along higher order terms. Nevertheless, it is worth exploring the need to include interaction terms in the matching process, particularly any that the process analysis being conducted by the Project Office indicates are related to participation.

Last, the results indicate that a trade-off will probably need to be made between selecting

comparison groups that are well matched along individual characteristics and those that are well matched along area characteristics. For many State Projects, area characteristics cannot be included in the beneficiary matching process, because there is little variation among the demonstration or comparison area along the area characteristics. For example, the New York project contains only two demonstration areas. Our strategy for selecting a comparison group that is similar to participants along area characteristics was to select separate comparison groups for participants in populous areas and those in non-populous areas. We divided participants along this area characteristic because State Project staff often reported that population density was an important factor when reviewing the initial list of comparison areas we sent them. Although this ensures that participants are well matched to their respective comparison group along population density, it does not ensure that the two groups are well matched along other area characteristics. We do not feel that this is a serious problem, because we found that after adjusting for individual characteristics, the characteristics of the area in which a beneficiary lives have an extremely small effect on their labor market outcomes.

IV

COMPUTING NET OUTCOMES AND ASSESSING THE VALIDITY OF THE RESULTS

s described in the previous chapter, our matching process for the core evaluation selects comparison groups that are well matched to participants along many important characteristics; however, despite our best efforts to select well-matched comparison

groups, results based on a comparison group design are inherently uncertain. Our process for selecting such groups ensures that they are similar, on average, to participants along all the characteristics for which we have data. These are most of the characteristics that the literature has found to be related to the outcomes for which net outcomes will be computed. However, it is possible that the two groups differ along important characteristics for which we do not have data and, therefore, could not use in the matching process.

For example, individuals who are motivated to work may also be the types of individuals

who are interested in participating in one of the State Projects. Unfortunately, SSA administrative data do not contain direct measures of the extent to which an individual is motivated to work. Therefore, we cannot include this characteristic directly in the matching process. Consequently, results based on our comparison groups may not reflect the true effect of the State Projects. Rather, they may reflect both the true effect of the State Projects and important unmeasured differences between participants and the comparison groups—differences that affect outcomes.

The design helps to adjust for these unmeasured characteristics by estimating net outcomes

using a statistical method often referred to as difference-in-differences (D-in-D). This method estimates net outcomes by first computing the difference between pre- and post-enrollment outcomes for each person. Net outcomes are then estimated by comparing the average pre-post enrollment change in outcomes of participants with the average pre-post enrollment change in outcomes of comparison group members. This approach implicitly accounts for all characteristics—both measured and unmeasured—that do not change over time when estimating net outcomes.

In the interest of providing SSA policymakers with accurate evidence of the effect of the

State Projects, we have designed two analyses to assess whether the core evaluation design will produce valid results. These validity analyses involve examining outcomes rather than characteristics, which is used to assess the success of the matching process (see Chapter III).

The first validity analysis selects comparison groups for participants several periods before

enrollment and then examines net outcomes during the periods after the earlier date used in this matching, but prior to enrollment. If our matching process is accurate, these net outcomes should equal zero, because neither participants nor comparison group members received any project services. This analysis is particularly useful for understanding the validity of the results produced by our design for those State Projects that are not conducting experiments.

A

70

IV. Computing Net Outcomes and Assessing the Validity of the Results

The second validity analysis compares net outcomes based on our comparison group design with net outcomes based on an experimental design. Because experimental methods are widely regarded as the benchmark for estimating the effect of an intervention, the results of this analysis will provide strong evidence of the validity of the design developed for the core evaluation. At this point, it appears that this validity analysis can be conducted for only one of the four State Projects (New York) that are conducting experiments, because it appears that only that project will enroll enough beneficiaries to support the analysis.

The rest of this chapter provides details about how net outcomes are estimated and how the

validity analyses are conducted. We also describe results from a preliminary implementation of the validity analyses based on an early cohort of participants in the New York project. The preliminary results suggest that our design will provide accurate evidence of the effect of the State Projects. However, these results should not be considered conclusive evidence of the design’s ability to produce valid results. That assessment should be based on a future implementation of the validity analyses using the full set of participants in all the State Projects and a longer follow-up period than was available for our preliminary analysis.

A. METHOD USED TO COMPUTE NET OUTCOMES

Net outcomes are estimated separately for SSI and DI-only beneficiaries, for two reasons. The first reason is to make use of the monthly post-enrollment employment and earnings information that is available for SSI beneficiaries, but not for DI-only beneficiaries. The second reason is because the State Projects are likely to affect employment behavior of the two groups differently, because the SSI and SSDI programs provide different work incentives. This means that any State Project where comparison groups were selected separately for participants who live in populous and non-populous areas need to be combined before estimating net outcomes.

Net outcomes are estimated using the D-in-D method. This method compares the average

pre-post enrollment change in outcomes of participants with the average pre-post enrollment change in outcomes of comparison group members.

The D-in-D method is operationalized using a statistical model. To illustrate how the model

is used, assume that we observe outcomes during each of the two months before enrollment and each of the two months after enrollment—a total of four months. Also assume that net outcomes are estimated using an analysis file that contains, for each person, a separate observation for each outcome we observe. In this case, the analysis file contains four observations for each individual and is used to estimate the following statistical model:

where Y equals the monthly outcomes; PRE2 equals 1 if the outcome corresponds to the second month before enrollment and zero otherwise; POST1 equals 1 if the outcome corresponds to the

2 2 1 1 2 2

2 2 1 1 2 2[ ] [ ] [ ]

Y = + PRE + POST + POST + P+ PRE P + POST P + POST P + X +

α β δ δ γλ τ τ ξ ε� � �

71

IV. Computing Net Outcomes and Assessing the Validity of the Results

first month after enrollment and zero otherwise; POST2 equals 1 if the outcome corresponds to the second month after enrollment and zero otherwise; P equals 1 for participants and zero otherwise; X includes characteristics of individuals measured at the time of enrollment; and ε is a random error term. X is included in the model to adjust for any characteristics that were included in the matching process, but that still may differ across participants and comparison group members. X is also included in the model to increase the precision of the net outcome estimates. For continuous outcomes, regression analysis is used to estimate the parameters of the model, whereas logit analysis is used for dichotomous outcomes.1

The parameters of the model are interpreted in the following way: α = average value of Y during pre-enrollment month 1 for the comparison group β2 = average difference for the comparison group in Y between pre-enrollment months 2

and 1 δ1 = average difference for the comparison group in Y between post-enrollment month 1

and pre-enrollment month 1 δ2 = average difference for the comparison group in Y between post-enrollment month 2

and pre-enrollment month 1 γ = participant and comparison group difference in the average value of Y during pre-

enrollment month 1 λ2 = participant and comparison group difference in the average value of the difference in

Y between pre-enrollment months 2 and 1 τ1 = participant and comparison group difference in the average value of the difference in

Y between post-enrollment month 1 and pre-enrollment month 1 τ2 = participant and comparison group difference in the average value of the difference in

Y between post-enrollment month 2 and pre-enrollment month 1 ξ = the relationship between X and Y The parameters τ1 and τ2 are interpreted as the D-in-D effect of the program on Y during the

first and second months after enrollment, respectively. These effects are net of any difference in Y that may exist between participants and comparison group members during the first month before enrollment.

1To facilitate interpretation of the results, a statistical software package that computes marginal effects for logit models is used to estimate these models—see Khan et al. (2002).

72

IV. Computing Net Outcomes and Assessing the Validity of the Results

The parameter λ2 is used to determine if the results are sensitive to the pre-enrollment outcome that is used in the calculation. λ2 indicates the participant and comparison group difference in Y between the second and first months before enrollment. For example, if the participant and comparison group difference in Y during the second month before enrollment is greater than the difference during the first month before enrollment, λ2 will be positive. Comparing τ1 and (τ1 – λ2) indicates the extent to which the net outcome estimate during the first month after enrollment is sensitive to the pre-enrollment outcome that is used in the calculation. Both of these are interpreted as the D-in-D effect of the program on Y during the first month after enrollment. However, the first estimate (τ1) is net of any difference in Y that may exist between participants and comparison group members during the first month before enrollment, whereas the second estimate (τ1 – λ2) is net of any difference that may exist during the second month before enrollment. Similarly, to understand whether net outcomes during the second month after enrollment are sensitive to the pre-enrollment outcome that is used in the calculation, τ2 is compared to (τ2 – λ2). Below we discuss how the results can be interpreted, if they are sensitive to the pre-enrollment outcome that is used in the calculation.

The difference between the data used in the example above and those that are actually used

to estimate net outcomes is that the actual data contain additional pre- and post-enrollment outcome information. For example, the data contain 24 months of pre-enrollment earnings and ultimately can contain 60 or more months of post-enrollment earnings. Therefore, future implementations of the core evaluation will use analysis files that contain, for each person, separate observations of earnings for each of the 24 pre-enrollment months plus separate observations for each of the post-enrollment months—a total that could easily exceed 80 observations by the time of the final evaluation.

The additional pre- and post-enrollment outcome information is incorporated into the

statistical model above by adding the following variables: (1) indicator variables for the additional outcome information for each person, except for the first pre-enrollment outcome, which serves as the reference; and (2) variables that interact the additional outcome information with participant status. In particular, the following statistical model can be used when the data contain M pre-enrollment outcomes and N post-enrollment outcomes:

where the variables and parameters are defined in the same way as the previous statistical model.

The parameters τt are interpreted as the D-in-D effect of the program on Y during month t,

where t ranges from the first month after enrollment to the last one contained in the analysis file.

2 1

2 1

[ ] [ ]

M N

t t t t

t t

M N

t t t t

t t

Y = + PRE + POST + P

+ PRE P + POST P X

α β δ γ

λ τ ξ ε

= =

= =

+ +

∑ ∑

∑ ∑� �

73

IV. Computing Net Outcomes and Assessing the Validity of the Results

These effects are net of any difference in Y that may exist between participants and comparison group members during pre-enrollment month 1.

The parameters λt are used to determine if the results are sensitive to the pre-enrollment

outcome that is used in the calculation. In particular, (τt – λt) indicates the D-in-D effect of the program on Y during month t when pre-enrollment month t (which ranges from 2 to M) is used in the calculation, instead of pre-enrollment month 1. If none of the λt parameters are statistically significant, it seems reasonable simply to interpret the τt parameters as the D-in-D effect of the program during month t. If some or all of the λt parameters are statistically significant, we suggest using only the more recent pre-enrollment outcomes, such as the ones during the past six months, to calculate net outcomes. For example, if λ2, λ3, and λ24 are statistically significant, we suggest reporting the average of (τt – λ2) and (τt – λ3) as the D-in-D effect of the program during month t. This calculation does not take into consideration the λ24 parameter, because the participant and comparison group difference during the 24th month before enrollment seems less important than the differences during the second and third months.

Standard errors of the net outcome estimates are computed using the standard formula for

computing standard errors of regression and logit models. Using the standard formula is somewhat counterintuitive, because participants and their respective comparison groups are not random samples, as the standard formula requires. Instead, the comparison groups are selected based on the characteristics of participants. Nevertheless, the appropriateness of using the standard formula was established in theoretical work done by Rubin and Thomas (1992).

B. ANALYSES USED TO ASSESS THE VALIDITY OF THE RESULTS

Although the D-in-D method adjusts for unmeasured characteristics that do not change over time, it does not adjust for the possibility that these characteristics may change over time. Continuing with the example above, it does not adjust for changes in the extent to which an individual is motivated to work. Since motivation is likely to affect outcomes, changes in motivation may also affect outcomes. Of course, the extent to which changes in unmeasured characteristics affect the results depends on the extent to which they influence outcomes among individuals who have been matched using many important characteristics, including many pre-enrollment observations of the outcomes of interest. Nevertheless, the possibility exists that the results will not reflect the true effect of the State Projects.

In this section, we describe two analyses that must be conducted to assess the extent to

which the core evaluation design will produce valid results. The first analysis selects comparison groups for participants several periods before enrollment, and then examines net outcomes during the periods after the earlier date used in this matching but prior to enrollment. The second analysis compares results based on our design with results based on an experimental design. Next we describe the way in which each of these analyses is conducted.

74

IV. Computing Net Outcomes and Assessing the Validity of the Results

1. Matching Several Periods Before Enrollment

The first validity analysis is an adaptation of an approach pioneered by Heckman and Hotz (1989). This analysis begins by matching beneficiaries using characteristics measured several periods before enrollment.2 Because values for some of these characteristics, such as race/ethnicity, do not change over time, they remain identical to what was measured during the actual enrollment date. However, others, such as pre-enrollment employment and earnings, are different, because they do change over time. Net outcomes are then estimated during the periods after the earlier date used in this matching, but prior to enrollment.

For example, this approach matches beneficiaries using characteristics measured six months

before enrollment and estimates net outcomes during the five subsequent months prior to enrollment. If the matching process was accurate, net outcomes during the five months prior to enrollment should equal zero, because neither participants nor comparison group members received any project services. If these net outcomes are different from zero, it suggests that the matching process did not select a comparison group that is well matched to participants. Put differently, it suggests that the experiences of the selected comparison group do not represent those that participants experienced.

It is important to conduct this validity analysis during a short time period before enrollment

to minimize the chance that important characteristics of the demonstration and comparison areas changed disproportionately during the period of interest. This validity analysis assumes that examination of net outcomes during a period that is earlier than when net outcomes are actually computed can be useful in assessing the validity of the actual results. It is possible that, for example, labor market conditions before enrollment might be different from the conditions afterward. Since labor market conditions affect many of the net outcomes of interest, examining pre-enrollment net outcomes may not provide a clear understanding of the validity of the post-enrollment ones. To address this possibility, the analysis should be conducted during a short period just before enrollment—for example, during the six months prior. This should increase the usefulness of the analysis, because the conditions of an area during the six months before enrollment are not likely to be radically different from the conditions afterward, when net outcomes are computed.

Nevertheless, area conditions can change quickly. Therefore, it is also important to examine

how key economic conditions—such as employment rates of SSA beneficiaries—and the policy environment of the demonstration and comparison areas changed between the six months before enrollment and the subsequent months during which net outcomes are computed. For example, the Oklahoma project is being implemented in five sites throughout the state and is offering eligible beneficiaries a voucher that they can use to purchase employment services from any of several approved vendors. Our design selects a comparison group from similar areas in Oklahoma that have not implemented the Oklahoma project. However, starting in February

2For potential comparison group members, characteristics are measured several periods before their pseudo enrollment date—see Chapter III. For ease of exposition, the pseudo enrollment date for potential comparison group members is referred to as the enrollment date.

75

IV. Computing Net Outcomes and Assessing the Validity of the Results

2002, beneficiaries who are eligible to participate in the Oklahoma project can also obtain a Ticket-to-Work (hereafter, a ticket). A ticket is similar to the voucher the Oklahoma project offers, because both provide beneficiaries with the means to purchase employment services. Therefore, any comparison group we select from the comparison areas we proposed for the Oklahoma project will have access to the same services as those available to participants of that project. It is important to consider this type of contamination of the comparison group when interpreting the results of the validity analyses. Chapter I describes many of the new demonstrations and initiatives that are being fielded.

Last, it is important to avoid conducting this validity analysis with outcomes that are used to

determine program eligibility. For example, many of the State Projects require that beneficiaries receive SSA benefits at the time of enrollment. For participants in these State Projects, our matching process selects comparison groups from among those beneficiaries who both lived in the project’s comparison area and met its targeting criteria, which, in this case, includes receiving SSA benefits at the time of enrollment. Moreover, since most participants who received SSA benefits at the time of enrollment also received benefits during several months before enrollment, our matching process will select a comparison group where most also received SSA benefits several months before enrollment. As a result, if this validity analysis was based on receipt of SSA benefits during the six months before enrollment, net outcomes during this period are likely to equal zero. However, this evidence does not indicate the extent to which the design will produce valid results. Instead, it reflects one of the criteria used to determine program eligibility.

An important matter is what should be done if the pre-enrollment net outcomes examined in

this validity analysis do not equal zero, which would suggest that the matching process did not select a comparison group that experienced outcomes different from those of participants. For example, suppose that, for participants and comparison group members that were matched six months before actual enrollment, the net outcome on employment during each of the five subsequent months before enrollment equals 10 percentage points. The matching process appears to have selected a comparison group that underestimates by 10 percentage points the employment rate that participants experienced. This underestimate should be subtracted from the post-enrollment net outcomes on employment. For example, suppose that the net outcome on employment during the first month after enrollment equals 15 percentage points. Subtracting the pre-enrollment net outcome of 10 percentage points would result in a post-enrollment net outcome of 5 percentage points.

2. Comparing Our Comparison Group Results to Experimental Results

The second validity analysis involves comparing results based on experimental methods with results based on our comparison groups. This analysis begins by computing experimental results for each of these State Projects. Experimental results are computed as the difference in average outcomes between the randomly assigned treatment and control groups. To adjust for any chance differences that exist between the two groups, the method of D-in-D is used to produce these results. Results based on our design are computed in a similar way, except that comparison groups that have been selected based on the characteristics of the treatment groups are used instead of the randomly assigned control groups. Since experimental methods are

76

IV. Computing Net Outcomes and Assessing the Validity of the Results

widely regarded as the benchmark for estimating the effect of an intervention, the results of this analysis will provide strong evidence of the validity of the comparison group design developed for the core evaluation (Ashenfelter 1987; LaLonde 1986; and Orr 1999).

This validity analysis can be conducted for the New York project, as it is both conducting an

experiment and enrolling enough beneficiaries to support the analysis. It may also be possible to conduct this analysis for the New Hampshire project, since that project also conducted an experiment. The issue is whether the New Hampshire project will enroll enough beneficiaries.

3. Alternative Validity Analyses

We also considered another validity analysis that assesses our matching process using individuals who were eligible to participate in a State Project but did not (hereafter, eligible non-participants, or ENPs). It involves comparing outcomes of ENPs with those of a comparison group that is selected for ENPs. If our matching process is accurate, the outcomes of ENPs and their comparison group should be identical, because neither group received any project services.

This approach offers at least two important advantages. First, it assesses validity during the

period when net outcomes are estimated (the Heckman and Hotz approach uses a pre-enrollment period). Second, it can provide validity assessments at several points in time, limited only by the availability of follow-up data (the Heckman and Hotz approach can provide validity assessments only during the period between the pre-enrollment month when beneficiaries were matched and actual enrollment).

The problem with this approach is that low participation rates in the State Projects may

undermine its ability to assess our matching process accurately. In particular, given the low participation rates, ENPs comprise most of the beneficiaries that meet a State Project’s targeting criteria. Thus, ENPs are likely to resemble the entire target group. In contrast, participants are likely to be considerably different from the target group, perhaps even along unmeasured characteristics—something that is often referred to as selection bias. As a result, comparing outcomes of ENPs and their comparison group may not provide an accurate assessment of our beneficiary matching process because selection bias among ENPs may be less of a problem than it is among participants.

In contrast, the Heckman and Hotz approach may provide a more accurate assessment

because it uses participants—a group where selection bias is likely to be a significant problem. Therefore, we believe that the Heckman and Hotz approach, together with the second validity analysis, will provide the most useful information for determining whether our design provides accurate evidence of the effect of the State Projects.

C. EXAMINING PRE-ENROLLMENT NET OUTCOMES

As mentioned, this validity analysis begins by selecting a comparison group for a group of participants using characteristics that were measured several periods before enrollment. Net outcomes are then estimated during the period after the matching occurred, but prior to

77

IV. Computing Net Outcomes and Assessing the Validity of the Results

enrollment. If the matching process was accurate, these net outcomes should equal zero, because neither participants nor comparison group members received any project services. If they are different from zero, it suggests that the matching process did not select a comparison group that is well matched to participants.

We conducted this matching for the New York project based on beneficiary characteristics

measured 6 months before enrollment. We then estimated net outcomes during months 5 through 1 before enrollment. We decided to conduct this test during the 6-month period before enrollment to reduce the chance that characteristics of the demonstration and comparison area change in important ways during these months and the subsequent months when net outcomes are actually computed. Nevertheless, we examine how the characteristics of the demonstration and comparison areas changed during the period of interest.

As was the case when we matched beneficiaries using characteristics measured during the

actual enrollment date, we selected comparison groups for two New York participant groups: (1) participants in the New York City site, and (2) participants in the Buffalo site. We selected comparison groups separately for participants in these two sites to help ourselves select comparison groups that are similar to their respective participant groups along area characteristics. The comparison group for New York City participants was selected from beneficiaries in Westchester and Nassau counties, whereas the comparison group for Buffalo participants was selected from beneficiaries in Oneida County. These are the areas that were selected using the multi-step process recommended for the core evaluation (see Chapter II).

After applying the iterative statistical matching process described in Chapter III, we selected

comparison groups that are well matched along all the available characteristics. In particular, only 3.5 percent of the approximately 750 within-stratum statistical tests detected a significant difference (at the 0.05 level) in the characteristics of New York City participants and their comparison group. Similarly, only 2.0 percent of these tests detected a difference in the characteristics of Buffalo participants and their comparison group. In short, both comparison groups pass the 95 percent test we established as our benchmark for judging the accuracy of the matching.

Figure IV.1 reports the percentage of participants and comparison group members that were

employed during each of the 12 months before enrollment. The results for participants are based on the combined sample of New York City and Buffalo participants. Similarly, the results for comparison group members are based on the combined sample of New York City and Buffalo comparison group members.

78

IV. Computing Net Outcomes and Assessing the Validity of the Results

FIGURE IV.1

NEW YORK PROJECT EMPLOYMENT RATES

0

10

20

30

40

50

-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1

Month Relative to Enrollment

Perc

ent

Participants

Comparisons

Matching

Month –6 is the point at which the matching occurred—that is, the sixth month before actual

enrollment. Therefore, months –12 through –7 represent outcomes before the matching occurred, whereas months –5 through –1 represent outcomes after the matching occurred but before actual enrollment.

As can be seen, the percentage of participants and comparison group members that were

employed during each of the months between month –12 and month –7 differs by about 4 percentage points; however, according to two-tailed t-tests, none of these differences are significantly different from zero at the 0.05 level. This was expected, since this pre-enrollment information was used in the matching process.

Although outcomes during months –5 through –1 were not used in the matching process,

they should be similar across the two groups, because neither participants nor comparison group members received any project services during these months. The results indicate that this is the case. None of the small differences in employment rates of participants and comparison group members during months –5 through –1 are statistically significant, according to two-tailed t-tests at the 0.05 level. Moreover, employment rates of the two groups become more similar during this time period. In particular, during month –5, about 21 percent of participants were employed, compared to about 24 percent of comparison group members—a difference of –3 percentage points. By month –1, the employment rates of the two groups differ by only 1 percentage point.

These preliminary results, together with the way in which important characteristics of the

New York demonstration and comparison areas changed during the period of interest, suggest that our design will provide accurate evidence of the effect of the New York project on its

79

IV. Computing Net Outcomes and Assessing the Validity of the Results

participants. As mentioned above, this validity analysis assumes that examining pre-enrollment outcomes for participants and comparison group members that were matched earlier can be used to assess the validity of post-enrollment net outcomes. This may not be the case, if important characteristics of the New York demonstration and comparison areas changed in different ways over time. To understand whether this is the case, we examined monthly unemployment rates of the New York demonstration and comparison areas between May 2000 and August 2002.3 Figure IV.2 reports these statistics for the New York City demonstration and comparison areas; Figure IV.3 reports these statistics for the Buffalo demonstration and comparison areas. As can be seen, the unemployment rate of the demonstration areas and their respective comparison areas are different and change throughout the period. However, the relative unemployment rates of the two areas remain about the same throughout the period. Therefore, the labor market opportunities of participants relative to comparison group members remain about the same throughout the period, which suggests that the results of this validity analysis can be used to assess the validity of the net outcome estimates.

FIGURE IV.2

NEW YORK CITY

DEMONSTRATION AND COMPARISON AREA UNEMPLOYMENT RATES

0

2.5

5

7.5

10

5/00

6/00

7/00

8/00

9/00

10/0

011

/0012

/00

1/01

2/01

3/01

4/01

5/01

6/01

7/01

8/01

9/01

10/0

111

/0112

/01

1/02

2/02

3/02

4/02

5/02

6/02

7/02

8/02

Month/Year

Per

cent NYC

Nassau

Westchester

3This validity analysis was based on participants who enrolled in the New York project between November 2000 and December 2001. (The previous chapter explains why our analysis is limited to participants who enrolled by December 2001.) Therefore, the earliest pre-enrollment outcome examined above was May 2000—six months before the first participants enrolled in the New York project. The latest pre-enrollment outcome examined was November 2001—the month before the last participants included in the analysis enrolled in the New York project. Assuming that net outcomes were computed up through August 2002, this means that we need to examine unemployment rates of the New York demonstration and comparison areas between May 2000 and August 2002.

80

IV. Computing Net Outcomes and Assessing the Validity of the Results

FIGURE IV.3

BUFFALO DEMONSTRATION AND COMPARISON AREA

UNEMPLOYMENT RATES

0

2.5

5

7.5

10

5/00

7/00

9/00

11/0

01/

013/

015/

017/

019/

0111

/01

1/02

3/02

5/02

7/02

Month/Year

Perc

ent

Buffalo

Oneida

D. COMPARISON OF PRELIMINARY FINDINGS FROM THE CORE AND RANDOM ASSIGNMENT EVALUATIONS FOR NEW YORK’S SPI PROJECT

The experiment the New York State Project fielded provides a means for assessing the accuracy of results obtained from the comparison group approach developed for the SPI core evaluation. In general, we find that the core evaluation design replicates the preliminary results from New York’s experiment. However, this early comparison is tentative, because the current sample size available from New York is small, and only four months of post-randomization data are available for the full sample. Since it will take a while for the effects of New York’s intervention to become evident, and because it will take a large sample of beneficiaries to detect the modest effects we expect the New York project to produce, a more definitive comparison of the two approaches will have to be made when more sample and data are available.

The comparison of findings begins with a brief overview of the experiment the New York State Project implemented to test its intervention. It then summarizes the preliminary estimates from that experiment and compares net outcomes derived from the experiment and from the core evaluation design when both designs are implemented using the same SSA administrative data set.

81

IV. Computing Net Outcomes and Assessing the Validity of the Results

1. The New York Experiment

Starting in fall 2000, the New York project began inviting virtually all eligible beneficiaries in its catchment area to participate. They did so by mailing invitations to all SSI beneficiaries who had a primary or secondary diagnosis of mental illness recorded on their SSI administrative record and who lived in New York City or Buffalo (the project did not serve SSDI-only beneficiaries. The letter contained a postage-paid postcard that interested beneficiaries were asked to return (they could also call a phone number to confirm their interest). In all, about 30,000 letters were mailed out, and 2,730 people had returned a postcard by the end of 2001. The project also accepted referrals from agencies that served people with mental illness.

The project randomly assigned beneficiaries who returned a postcard (or who were referred)

to one of following three study groups:

��Full-Service Group. This group of beneficiaries is offered intensive benefits planning and assistance designed to help them understand the work incentives available to them and the ways in which the SSI program permits them to try working without losing their benefits. The project also assesses participants and refers them to other service providers to obtain required employment supports. In addition, four SSI program rules are waived in order to let participants keep more of their earnings. The major waiver lowers the rate at which benefits are reduced in response to earnings, from $1 for every $2 earned to $1 for every $4 earned. Another waiver exempts them from continuing disability reviews that might have been initiated because of the beneficiary’s work efforts.

��Enhanced-Service Group. This group is offered all the services and waivers offered to the full-service group, plus the assistance of an employment coordinator, who helps them obtain services from a wide array of employment-support programs. The coordinator’s services are important, because employment services are offered by a variety of agencies and often have different eligibility criteria. Unassisted individuals, even those with formal referrals, can have great difficulty receiving services or maintaining continuity in service receipt.

��Control Group. This group is given a list of available service providers in their community and an SSA guide to work incentives, but is otherwise not offered any services from the New York SPI project or the SSI waivers.

After random assignment, beneficiaries assigned to the full-service or enhanced-service

groups were sent additional information about the services available and asked to enroll in the project. Enrollment required interested beneficiaries to sign a SPI consent form and attend an information session that explained SSI work incentives. These extra steps often took two or more months to complete, so, for most participants, service receipt started well after random assignment. The first beneficiaries enrolled in the program in November 2000. By the end of December 2001 (the last date for which data were made available for this analysis), about 24 percent of the beneficiaries randomly assigned to the treatment groups had participated (Table IV.1).

82

IV. Computing Net Outcomes and Assessing the Validity of the Results

TABLE IV.1

SAMPLE SIZES IN THE NEW YORK PROJECT EXPERIMENT4 (Through 12/02)

Study Group Assigned to Group Participated in

Project Participation Rate

Full Service 931 254 27.3%

Enhanced Service 959 246 25.7%

Control 840 not applicable not applicable

Total 2,730 500 not applicable

For purposes of testing the validity of the core evaluation design, we combined the full-

service and enhanced-service groups. This gives the analysis a larger sample for comparing the experiment and core evaluation designs: a total of 1,890 people in the combined treatment group and a total of 500 participants. Also, there are no significant differences in the outcomes for beneficiaries assigned to the full-service and enhanced-service groups during the time period covered in this analysis. The final evaluation, however, should plan to conduct separate analyses for the two intervention groups, because they have different policy implications. The extra service coordination provided in the enhanced-service model is intended to produce larger changes in employment and is more expensive to provide than the package of services offered in the full-service model.

Also, we focus only on employment rates, which constitute a key mechanism that is

expected to drive changes in benefit receipt and total income. The other key mechanism is earnings, but SSI earnings data cannot be analyzed at this time for New York because SSA field office staff purposefully underreported participants’ earnings in the SSA data system in order to implement one of the SSI waivers correctly. The final evaluation will be able to address that problem using information from the implementation analysis and the reporting algorithms used by the field office staff (Khan et al. 2002).

4For the most part, eligible beneficiaries had an equal chance of being assigned to each of the three groups. The one exception was that for a short time the project changed the assignment ratios so that more beneficiaries would be assigned to the full-service or enhanced-service groups. This was done, in part, to help the project operate at full capacity. Thus, the final sample will have a higher proportion of beneficiaries in the intervention groups than in the control group.

83

IV. Computing Net Outcomes and Assessing the Validity of the Results

2. Preliminary Estimated Effects: Experimental Results

At this time, only limited data are available from New York’s experiment. Specifically, we have data covering 10 months: the 6 months prior to random assignment, the month of random assignment, and 3 post-assignment months. Furthermore, we expected to find no treatment-control differences during the period covered by these preliminary data. For the pre-assignment period, the whole goal of random assignment was to eliminate differences. A lack of any treatment-control differences during that period would indicate that randomization was implemented successfully. For the post-assignment period, we do not expect to see any treatment-control differences for several months, because of the lag between random assignment and the start of services. Also, it is likely to take some time for the benefits counseling and employment support efforts to have an effect on the employment behavior of beneficiaries.

As expected, we found no statistically significant differences in employment rates for the

treatment and control groups during the pre- and post-assignment months examined here (Figure IV.4). Employment among the control group members tends to be slightly lower than for the treatment group members, but this difference is probably due to random chance. Furthermore, any random differences between the groups tend to be similar over time because of the highly stable employment rates (or more accurately, non-employment rates) among SSI beneficiaries.

FIGURE IV.4

NEW YORK PROJECT EMPLOYMENT RATES FOR TREATMENT, CONTROL, AND COMPARISON GROUPS

0

2

4

6

8

1 0

1 2

1 4

1 6

1 8

2 0

-6 -5 -4 -3 -2 -1 0 1 2 3

M o n th s R e la t iv e to R a n d o m A s s ig n m e n t

Perc

ent E

mpl

oyed

T re a tm e n t G ro u p

C o n tro l G ro u p

C o m p a r iso n G ro u p

84

IV. Computing Net Outcomes and Assessing the Validity of the Results

While the lack of any treatment-control differences was expected, the limited statistical power provided by New York’s sample size would make it hard to detect small or even modest effects as significant. With the available sample sizes, the random assignment evaluation has an 80 percent chance to detect a treatment-control difference of 4 percentage points in employment (using a two-tailed test at the 95 percent level when about 10 percent of the control group is employed). Given the observed employment rates of the treatment and control groups, this means that the design would be hard-pressed to detect effects smaller than a 40 percent increase in employment (from 10 to 14 percent employed). The ability to detect effects will be somewhat greater when the full sample is available and when regression methods are used to control for some of the underlying variation in individual employment status.

3. The Similarity of Employment Changes Estimated Using the Core Evaluation and Experimental Designs Using the Full Treatment Group

To test the accuracy of the matching procedures that are planned for the core evaluation (and were described in Chapter III), we used those procedures to select a group of beneficiaries that are matched to the New York Project’s treatment group. We expected that if the matching procedures were accurate, outcomes for that matched comparison group would be similar to those of the control group and the pre-assignment outcomes for the treatment group.

This test suggests that the matching process is accurate, because the employment rates of the

matched comparison group closely track those of the control group and those of the treatment group during the time period when we do not expect to see any effects from the intervention (Figure IV.4). In no month was the employment rate for the matched comparison group statistically different from the rates for either the treatment group or control group. The similarity in employment during the six pre-assignment months is not surprising, because those employment rates are used in the matching process. However, the similarity in employment for the comparison and control groups, particularly during the post-assignment months, does suggest that outcomes for the comparison group reflect those that the treatment group would have had in the absence of SPI.

Another way to see this result is to compare estimates based on the treatment-control

difference with those derived from the difference between the treatment and matched comparison groups (Figure IV.5). That comparison suggests that while there are slight differences in the specific estimates, particularly during the months prior to random assignment, both sets of estimates give the same overall conclusion. That is, neither suggests that employment rates changed significantly as a result of the project (at least during this preliminary observation period). During the post-assignment period, the two sets of estimates are nearly identical, but it is hard to tell whether that high degree of alignment will persist or merely reflects fortuitous random chance. Therefore, this test should be repeated when additional sample members and follow-up observations become available.

85

IV. Computing Net Outcomes and Assessing the Validity of the Results

FIGURE IV.5

NEW YORK PROJECT ALTERNATIVE ESTIMATES OF THE NET CHANGE IN EMPLOYMENT RATES

FOR THE FULL TREATMENT GROUP

-10

-8

-6

-4

-2

0

2

4

6

8

10

-6 -5 -4 -3 -2 -1 0 1 2 3

M onths Relative to Random A ssignm ent

Est

imat

ed C

hang

e in

Em

ploy

men

t Rat

e

T reatment - Control

T reatment-Comparison

Our general sense of this comparison to New York’s experiment is that the comparison

group design developed for the core SPI net outcome evaluation generates conclusions similar to those produced by an experiment. Thus, it appears that the core evaluation can produce reliable information about net outcomes. At the same time, this validity assessment is preliminary, since it uses an early sample of the treatment and control groups with a very limited follow-up observation period. In addition, the similarity of findings for the New York project suggests, but does not prove, that the comparison group design will do as well in other states. Therefore, we have recommended that this test be conducted again when more data are available.

V

PROTOCOL FOR IMPLEMENTING THE CORE EVALUATION IN THE FUTURE

major goal for the design of the core SPI evaluation is to provide SSA with a protocol that can be used to produce net outcome estimates in the future. This future implementation would be based on the final list of participants in all the State Projects

and a longer follow-up period than is currently available. To meet this goal, the protocol must describe how the following tasks are completed:

��Obtain SSA data for the final list of participants and potential comparison group members

��Use SSA data to select comparison groups for the final list of participants

��Produce net outcome estimates based on SSA data for the final list of participants and selected comparison groups

Table V.1 summarizes the protocol we have developed. This process, which is run on

SSA’s computer, uses many computer programs and many data files to construct two files for each State Project. The first (the SSI matching file) contains information for SSI participants and potential comparison group members for these participants; the second (the DI-only matching file) contains DI-only participants and potential comparison group members for these participants.1 The matching files are used to select comparison groups for participants in each State Project, separately for SSI and DI-only participants. The data files from which the matching files are constructed include SPI Project Office data for the list of participants, SSA data for the list of potential comparison group members, and SSA data for both baseline and outcome information for participants and potential comparison group members. Analysis files that contain information just for participants and selected comparison group members are then extracted from the matching files. The analysis files are used to produce net outcome estimates.

We have developed all the computer programs required to implement the core evaluation, so

all that is required to implement that design in the future is the final data sets, as well as sufficient researcher time to both select well-matched comparison groups and estimate net outcomes. This researcher time, however, can be substantial given the complex and iterative nature of the comparison group selection process. To understand the amount of SSA data that will have to be processed when implementing the design in the future, the preliminary

1Details about the contents of the matching file are presented in Agodini et al. (2001).

A

88

V. Protocol for Implementing the Core Evaluation in the Future

assessment of the matching process described in Chapter III involved processing about 85 SSA data files that make up 8 distinct extracts, each of which contains at least a million records, if not several million.

TABLE V.1

SUMMARY OF THE PROTOCOL FOR IMPLEMENTING THE CORE EVALUATION

Step Input Output

1. Obtain the final list of participants and the list of potential comparison group members

VCU data for participants. REMICS and ZIP files for potential comparison group members

Finder file, Historical REMICS, and Historical ZIP

2. Obtain SSA extracts for beneficiaries in the finder file

Finder file SSA extracts—including the SSI longitudinal, MBR 810/811, 831 disability, and SER

3. Construct analysis variables in the SSA extracts and merge the extracts

SSA extracts, Historical REMICS, and Historical ZIP

SSI and SSDI matching files for each State Project

4. Select comparison groups using the neighbor.sas macro

Matching files Analysis files

5. Compute net outcomes Expanded analysis files Results

This chapter provides an overview of the protocol we have developed. This includes a

description of the way the matching file is constructed, comparison groups are selected, and net outcomes estimates are produced. Khan et al. (2002) describe the protocol in greater detail.

A. CREATING THE MATCHING FILE

All the SSA administrative data used in this evaluation are contained in five large systems: the Supplemental Security Record (SSR), the Master Beneficiary Record (MBR), the 831 Disability File, the Numident, and the Summary Earnings Record (SER). The SSR contains all the information needed to generate SSI benefit checks, whereas the MBR contains all the information needed to generate SSDI benefit checks. Each of these data systems also contains information that is not specifically needed for generating benefits checks, but that is useful for research purposes. For example, each data system also contains the race/ethnicity of beneficiaries. The 831 Disability File contains information from initial disability determination decisions, as well as any reconsideration decisions. The Numident contains the master file of assigned Social Security numbers (SSNs). The SER, an extract from the Master Earnings File, contains calendar year earnings information dating back to 1951, as well as some other identifying information, for individuals who have worked in covered employment. Each of these data systems contains tens of millions of records.

89

V. Protocol for Implementing the Core Evaluation in the Future

The core component of the SPI evaluation uses information from each of these five data systems, although the amount of information obtained from each file differs. In particular, most of the information comes from the SSR and the MBR. Only information about a beneficiary’s education is obtained from the 831 Disability File, and only calendar year earnings information is obtained from the SER. The Numident is used to verify participant SSNs that are obtained from participant data collected by Virginia Commonwealth University (VCU).

Information from the 831 Disability File and the SER is obtained by a “finder file” process,

which amounts to submitting a list of SSNs to SSA, which, in turn, creates extracts that contain data only for the list of SSNs that was submitted.

Unfortunately, it is not currently possible to extract data directly from the SSR or the MBR

for more than one SSN at a time. It is also not possible to extract data from these systems using a field other than SSN. For example, it is not possible to extract individuals residing in a specific county or of a particular race/ethnicity.

Instead, information from the SSR and the MBR is obtained by assembling information

from several intermediate extracts that are created from those data systems. Some of these extracts are snapshots of how the SSR or MBR looked at regularly scheduled times, such as the end of a specific month. The amount of information contained in these extracts depends on the size of the sample included in the extract. In particular, extracts that contain all beneficiaries contain only a handful of variables, whereas smaller samples (such as a 10 percent random sample of all beneficiaries) contain more. Whatever the case, the variables in these extracts include information only at, or around the time of, the snapshot—not from earlier points in time.

Data from other intermediate extracts can be obtained through the finder file process. These

extracts also are a snapshot of how the SSR or MBR looked at a particular point in time. However, the variables in these extracts include longitudinal information at both the time of the snapshot and as far back as the start of the SSI or SSDI programs. The extract from the SSR is called the SSI Longitudinal, and the extract from the MBR is called the MBR 810/811.

The protocol draws from these various extracts to develop the matching and other files used

in the core evaluation. Figure V.1 summarizes that protocol.

1. Verifying Participant SSNs

The first step is to verify the SSNs of participants in the State Projects. This is done by submitting to SSA a text file that contains participants’ SSNs, names, dates of birth, and sex, according to a format specified by SSA. The source for this information is data that the SPI Project Office is helping the State Projects collect about their participants. These data are being analyzed by the Project Office to describe the characteristics of beneficiaries who participate in the State Projects, the services they received, and their outcomes (Jethwani et al. 2002).

SSA uses the Numident to verify the SSNs submitted for all participants. This verification

process produces two output files, one that contains participants whose reported SSNs were verified and another that contains those whose numbers were not. SSA considers someone as

90___________________________________________________________________________

_____________________________________________________________________________ V. Protocol for Implementing the Core Evaluation in the Future

MonthlyREMICS extracts

01/97-month of interest

Participant data from

VCU

QuarterlyZIP extracts

01/99-quarter of interest

Create SSI finder file andHistorical REMICS file

Verify participant SSN and nameusing the Numident

Create SSDI finder file andHistorical ZIP file

Merge the SSI and SSDI finder files with the list of verified participant SSNsto create one finder file, and submit it to SSA

Historical REMICS

file

Historical ZIP file

MBR 810/811 extract divided into many files

SSI Longitudinal

extract divided into many files

SER extract divided into many files

831 Disability extract divided into many files

Run several computer programs to create one

MBR 810/811 file

Run several computer programs to create one SSI Longitudinal file

SSA runs several computer programs

to create one SER file

Run several computer programs to create one

831 Disability file

One MBR 810/811

file

One SSI Longitudinal

file

One 831 Disability file

Create SSI and SSDI matching file, both without SER data

SSI matching

file without SER data

SER file

SSDI matching

file without SER data

SSA merges SSI matching file with the

SER file

SSA merges the SSDI matching file with the

SER file

SER-Enhanced

SSI matching

file

SER-Enhanced

SSDI matching

file

FIGURE V.1

PROCESS USED TO CREATE THE MATCHING FILES

91

V. Protocol for Implementing the Core Evaluation in the Future

having passed the verification process if at least one of the following two conditions is met: (1) the SSN and name exactly match an entry in the Numident file, or (2) the name and date of birth match using single select routines or alpha search routines. The core evaluation uses only cases that passed the first condition. Cases that passed the second condition are returned to the State Projects for verification of SSNs.

2. Creating the Finder File

The finder file is created by running computer programs that initially create two finder files, one for SSI beneficiaries and the other for SSDI beneficiaries. Both files contain SSNs of participants who enrolled in the State Projects up the point of interest. Both files also contain individuals who, during any time between January 1999 and the point of interest, met the following criteria: (1) were 11 to 64 years old, (2) had an active case, and (3) lived in any one of the demonstration or comparison areas. The first two criteria, along with the “lived in any one of the comparison areas” portion of the third criteria are used to identify the pool of potential comparison group members. The SSI finder file was created from the monthly REMICS files, which contain data from the SSR records of SSI beneficiaries during a particular month. The SSDI finder file was created from the quarterly ZIP files, which contain data from the MBR records of SSDI beneficiaries during a particular quarter.

Since some individuals receive both SSI and SSDI benefits, a final computer program is

used to merge the two finder files to create one finder file with unique SSNs. This computer program also merges the verified SSNs of participants to make sure they are contained in the finder file.

As the monthly REMICS and quarterly ZIP files are processed, two other files are generated

that are needed to create the matching file. The first is what we refer to as the Historical REMICS, which contains the same individuals included in the SSI finder file—that is, the SSN of every SSI beneficiary who meets the finder file inclusion criteria described above. It also includes 34 variables from each of the monthly REMICS files between January 1997 and the point of interest. The variables include sex, race, date of birth, primary diagnosis, earned income, unearned income, use of SSI work incentives, Medicaid eligibility, alien status, residence state, county, and zip code. The Historical REMICS contains information from the monthly REMICS files during the two years before the start date of the finder file time frame (that is, between January 1997 and December 1998) in order to include monthly earnings during the two years before anyone met the inclusion criteria for the finder file.

The second file is what we refer to as the Historical ZIP, which contains the same

individuals included in the SSDI finder file—that is, the SSN of every SSDI beneficiary who meets the finder file inclusion criteria described above. It also includes the residence state, county, and zip code from each of the quarterly ZIP files between January 1999 and the point of interest. Unlike the Historical REMICS, the Historical ZIP does not contain information from the quarterly ZIP files during the two years before the start date of the finder file time frame (that is, between January 1997 and December 1998), because those early ZIP files do not contain any information that is needed for the matching process (and that therefore would have to be included in the matching file).

92

V. Protocol for Implementing the Core Evaluation in the Future

3. Obtaining and Processing the SSA Extracts

The combined finder file is then submitted to SSA, and a request is made for four extracts: the SSI longitudinal, MBR 810/811, 831 Disability File, and SER. SSA uses a standardized format to produce all extracts but the MBR 810/811. The format of the MBR 810/811 used in the core evaluation was custom-made by SSA staff and contains only the relevant data elements for the core component of the SPI evaluation.

Because each one of these extracts is extremely large, SSA breaks each one into several

files. For example, the MBR 810/811 we worked with was provided as 22 files; the SSI longitudinal as 14 files; the 831 Disability File as 13 files; and the SER as 11 files.

Several computer programs are used to create one file from each set of files. Before these

programs are run in the future, a couple of lines in each program need to be changed to reflect the month and year of the new extract. A couple of lines in the program that processes the MBR 810/811 files must also be changed to reflect new record positions of a couple of variables in the extract. The latter changes are needed because the position of these and subsequent variables changes in the custom version of the MBR 810/811 extract SSA creates. One line in the program that processes the SER files must also be changed to reflect additional calendar-year earnings data that will be included in the future SER files. The positions of these variables are obtained from the file layout that SSA provides with the MBR 810/811 and SER extracts.

4. Merging the Processed Extracts to Create Two Matching Files

After all the extracts are processed, two intermediate matching files are created, one for SSI/concurrent beneficiaries and the other for DI-only beneficiaries. These programs simply merge the MBR 810/811, SSI Longitudinal, 831 Disability File, and the Historical REMICS. The Historical ZIP is also merged when the DI-only matching file is created.

The final matching files are created by SSA staff who merge the SER data to each of the

matching files. SSA staff have to add the SER data, because restrictions limit its use to SSA employees. After the SER data have been added to the SSI and DI-only matching files, a pair of SSI and DI-only matching files is created for each of the 13 State Projects included in the core evaluation.

It is important to monitor closely the structure of the SSA extracts used to create the

matching files. In the course of creating an earlier set of matching files, we discovered that the structure of the REMICS and SSI Longitudinal files changed. For example, in the previous version of the SSI Longitudinal, the payment status of a terminated case was coded as terminated during every month after termination. To save space on SSA’s computer, that field was modified to include a missing value for the future payment status of a case that was terminated. As a result, we had to modify the way in which we use payment status from the SSI Longitudinal when creating the matching file. In particular, for those cases that were terminated, we coded each month of their future payment status as terminated.

93

V. Protocol for Implementing the Core Evaluation in the Future

B. SELECTING COMPARISON GROUPS

Comparison groups can be selected from each State Project’s matching file using a SAS macro we developed called neighbor.sas. This program selects comparison groups using propensity scores. In particular, it selects comparison groups according to the three steps described in Chapter III:

1. The program estimates a probability model of participant status. It estimates a logit model, where a binary dependent variable that equals one for participants and zero for all potential comparison group members is regressed on independent variables that represent individual characteristics.

2. It assigns a propensity score to each individual. This score is a single number that equals the weighted sum of an individual’s values for the characteristics included in the logit model, where the weights are the parameter estimates of the logit model.

3. It uses propensity scores to select comparison group members. It selects, for each participant, the potential comparison group member with the closest absolute propensity score, or the “nearest neighbor.” It perform this process with replacement, so that a potential comparison group member can be matched to several participants.

Each time neighbor.sas is run, it creates an analysis file that can be used to estimate net

outcomes. This file contains participants, the unique number of selected comparison group members, and all the variables contained in the matching file, unless variables are intentionally deleted in the course of running the macro. Neighbor.sas also adds several variables to the analysis file, including the propensity score, the stratum to which individuals were assigned, and a weight. The weight equals one for participants; for comparison group members, it equals the number of times each comparison group member was matched to a participant. For each participant, a final variable is added that indicates the SSN of the selected comparison group member. This variable is coded as missing for comparison group members.

As described in Chapter III, a comparison group selected using propensity scores could

produce unbiased impact estimates if two conditions are satisfied: (1) all the characteristics that are related to participant status and also to outcomes are observed, and (2) participants and comparison group members with similar propensity scores are similar along these characteristics. The second condition means that the logit model must produce an estimate of the propensity score such that, at each value of the estimated propensity score, the characteristics of participants and comparison group members are similar.

Neighbor.sas cannot be used to determine if the first condition is satisfied; however, by

examining statistical tests it conducts, it can be used to determine whether the second condition has been satisfied. Researchers often determine whether the second condition has been satisfied by first assigning participants and comparison group members to strata, where each stratum includes participants and comparison group members whose average propensity score is not statistically different. Within each stratum, each characteristic of participants and comparison group members are then compared. If they do not differ, researchers conclude that the second

94

V. Protocol for Implementing the Core Evaluation in the Future

condition has been satisfied. If they do differ, researchers respecify the logit model by adding higher-order and/or interaction terms and reselect a comparison group until the characteristics of participants and comparison group members within each stratum are not different.

Neighbor.sas conducts these tests using the characteristics included in the logit model. In

particular, the collection of participants and comparison group members is first ranked according to their propensity scores. Individuals are then equally divided into as many strata as the user defines. Within each stratum, three statistical tests are conducted. The first is a t-test of the similarity of the average propensity score of participants and comparison group members. The second is an F-test of the similarity of the collection of characteristics of participants and comparison group members included in the logit model. The third is a series of t-tests of the similarity of participants and comparison group members in terms of all the characteristics included in the logit model. If none of these statistical tests are rejected, then the second condition has been satisfied.

Unfortunately, not all the available characteristics can be included in the logit model,

because the number of characteristics exceeds the number of beneficiaries in each participant group. To address this issue, the program that uses neighbor.sas also conducts, within each stratum, t-tests of the similarity of each available characteristic (not just the ones included in the logit model) across participants and comparison group members. We considered a comparison group to be well matched to its respective group of participants if 95 percent of these statistical tests failed to detect a difference (at the 0.05 level for a two-tailed test).

C. PRODUCING NET OUTCOME ESTIMATES

As described in Chapter IV, net outcomes are estimated using the method of difference-in-differences (D-in-D), which compares the average pre-post enrollment change in outcomes of participants with the average pre-post enrollment change in outcomes of comparison group members.

The D-in-D method is operationalized by estimating a statistical model using a data set that

contains a separate observation for each pre- and post-enrollment outcome observed for each person. For example, if we were analyzing monthly earnings using the currently available information, the data set would contain 24 months of pre-enrollment earnings and 3 months of post-enrollment earnings—a total of 27 observations for each person. In future implementations of the core evaluation, there will be 60 or more months of post-enrollment earnings. Therefore, future implementations of the core evaluation will use a data set that contains, for each person, separate observations of earnings for each of the 24 pre-enrollment months plus separate observations for each of the post-enrollment months—a total that could easily exceed 80 observations for each person by the time of the final evaluation.

This data set is created by running a computer program that expands the analysis file created

by neighbor.sas. This program creates all the variables needed to estimate the statistical model that is used to produce net outcomes using the D-in-D method.

95

V. Protocol for Implementing the Core Evaluation in the Future

D. STATISTICAL POWER OF THE CORE EVALUATION

We have computed the minimum detectable difference (MDD) in employment rates for the core evaluation of each State Project (Table V.2). The MDD indicates the smallest change in employment rates that the evaluation would have an 80 percent chance of detecting as statistically significant using a 95 percent one-tailed difference-of-means test.2 These calculations assume that participants and comparison group members were randomly assigned to their respective group. The work of Rubin and Thomas (1992) suggests that calculations using the methods appropriate for randomly assigned groups provide a realistic estimate of power for comparison groups selected using propensity scores. The MDDs also assume that 50 percent of participants would have been employed in the absence of the project interventions. This is generally consistent with the preliminary SPI data, although this rate varies considerably among the states. For example, 60 percent or more of participants in the Iowa, Vermont, and Wisconsin projects were employed at some time during the year prior to enrollment. The rate for New York participants was approximately 35 percent.

MDDs are computed separately for the SSI and DI-only groups because net outcomes are

estimated separately for those groups. As a result, the evaluation will estimate 24 sets of net outcomes for the 13 State Projects that are providing SSNs for the core evaluation.

Given these assumptions, an MDD of 10 percentage points means that we have an

80 percent chance of detecting an actual effect that increased employment rates by 20 percent among participants (from 50 to 60 percent employed). The power of the final evaluation can be increased if the regression models used to estimate net outcomes explain a substantial part of the underlying variation among beneficiaries in employment rates. Essentially, the more of that underlying variation that can be explained, the easier it is for the evaluation to detect changes due to the State Projects. Our preliminary work suggests that these models should be able to explain at least 40 percent of the variance (that is, the models will have an R-squared statistic of at least 0.4).

The MDD calculations also assume that the evaluation selects a comparison group the same

size as the participant group. This assumption reflects the procedures developed in Chapter III that find a comparison beneficiary for every participant. Our experience, however, was that the matching process tended to match some comparison beneficiaries to more than one participant. Thus, the comparison groups typically contained about 90 percent of the participant group sample size. As an alternative approach, we developed programs that implement a matching process (caliper.sas) that can select more than one comparison beneficiary for each participant as long as those comparison beneficiaries are similar. In that case, the final evaluation would have a larger sample than shown in Table V.2 and could detect smaller MDDs.

2A one-tailed test is appropriate for SPI, because only increases in employment among participants would be policy-relevant.

96___________________________________________________________________________

_____________________________________________________________________________ V. Protocol for Implementing the Core Evaluation in the Future

TABLE V.2

EXPECTED NUMBER OF PARTICIPANTS AND THE MINIMUM DETECTABLE DIFFERENCES (MDD) FOR THE CORE EVALUATION

State Project Participant Group Expected Number of

Participantsa

MDDs for the State Project Sites

(Percentage Points)

Adjusted MDDs for Each Full State Project

(Percentage Points)

SSI 155 14.1 10.9

California SSDI-only 79 19.8 15.3

SSI 196 12.5 9.7

Illinois SSDI-only 37 28.9 22.4

SSI 307 10.0 7.7

Iowa (SSA) SSDI-only 353 9.3 7.2

SSI 160 13.9 10.8

Minnesota SSDI-only 179 13.1 10.1

SSI 54 23.9 18.5

New Hampshire SSDI-only 104 17.2 13.3

SSI 464 8.1 6.3

New Mexico SSDI-only 359 9.3 7.2 New York

SSI 942 5.7 4.4 SSI 159 13.9 10.8

North Carolina SSDI-only 214 12.0 9.3

SSI 344 9.5 7.4

Ohio SSDI-only 255 11.0 8.5 Oklahoma

SSI 51 24.6 19.1

SSI 272 10.6 8.2

Utah (RSA) SSDI-only 131 15.3 11.9

SSI 378 9.0 7.0

Vermont SSDI-only 416 8.6 6.7

SSI 444 8.3 6.4

Wisconsin SSDI-only 155 9.7 7.5 Note: The minimum detectable differences are computed assuming the use of a one-tail test of the difference of

means with a 95 percent significance level and an 80 percent power where the employment rate of the comparison group is 50 percent. The values shown in column 4 assume that no regression adjustment is made, while the values shown in column 5 assume the use of regression models that have an R-square of 0.4.

aTotal enrollments are taken from Table I.3. The expected allocation between SSI and SSDI-only cases was estimated using the distributions derived from Project Office data through December 2001 (Jethwani et al. 2002).

97

V. Protocol for Implementing the Core Evaluation in the Future

Using the procedures that select only one comparison beneficiary for each participant, it appears that there will be sufficient statistical power for many State Projects to detect policy-relevant impacts, although the power of the final analyses depends on the State Projects meeting their recruitment targets. The regression-adjusted MDDs range from 4.4 percentage points for the SSI group in New York, where a sample of 942 participants is expected, to over 15 percentage points in the 3 instances were fewer than 60 people are expected to be enrolled. As a reference point, consider that the Transitional Employment Training Demonstration increased employment rates by 9 to 14 percentage points (Decker and Thornton 1995). If SSA regarded only those net increases in employment that exceeded 14 percentage points as policy-relevant, then there should be enough statistical power to detect such effects in 20 of the 24 project/beneficiary groups. Half the project/beneficiary groups would have enough power to detect a change of 9 points.

The power of the evaluation is greater when states are pooled. For example, if the

evaluation combines the samples from the three states that implemented all four SSI waivers (California, New York, and Wisconsin SSI), the MDD for regression-based estimates would be just 3 percentage points. Similarly, if the samples for all states were combined, the resulting sample of approximately 6,000 beneficiaries would yield an MDD of well under 2 percentage points. Such a combined analysis would indicate the overall effect of a policy that gave states the general mandate and funding provided by SPI.

E. SUGGESTED FUTURE SCHEDULE

The full evaluation of SPI will take place during the next few years. The key factors shaping the schedule include the following:

��The date when projects will have completed enrolling participants.

��The length of the follow-up period required to observe policy-relevant outcomes

��The availability of the data used in the evaluation Projects have typically planned to end enrollment (at least for the purposes of their internal

evaluations) during fall 2002. That timing reflects the fact that the initial agreements with the states expire in September 2003. States will need the time until then to collect and analyze data on their participants before their final reports are due in December 2003. It is possible that some projects will operate beyond that schedule. If so, it should still be possible to conduct a preliminary evaluation with the sample of participants who had enrolled during 2002.

It will probably take six months or more of followup to observe effects from the projects.

Unlike programs that place participants directly on jobs, the State Projects generally offered services that affect employment indirectly. For example, the benefits counseling services offered by all projects help beneficiaries understand the available work incentives. That understanding should help employed beneficiaries maintain their jobs and encourage unemployed beneficiaries

98

V. Protocol for Implementing the Core Evaluation in the Future

to seek jobs. However, those effects will take a while to materialize, since they rely on the actions participants take after receiving project services.

Most of the administrative data used in the evaluation can be obtained quickly. The major

exceptions are the SER and the projects’ participant tracking data. The SER data are used to select comparison groups and measure post-enrollment changes in earnings. These data, which come from tax reports, are typically available to researchers 12 to 14 months after the end of the calendar year to which they apply. Thus, the SER data covering 2002 are not expected to be available until early in 2004. The system for collecting participant tracking data, particularly data about service receipt, is still being refined. Data covering project activities through September 2002 should be available by May 2003. Data covering all services provided by the projects appears unlikely to be available before the end of 2003.

Given these factors, we recommend that the core design be implemented in early 2004. At

that time, there would be enough SER data to select comparison beneficiaries for all participants. Also, 12 months of follow-up data would be available to measure benefit receipt for all participants and to measure employment and earnings for SSI beneficiaries. In addition, the implementation evaluation should be complete by the end of 2003 and so can provide the qualitative information required to interpret quantitative findings from the core evaluation. A more complete evaluation that used the SER data to measure effects on employment and earnings for SSDI beneficiaries and SSI beneficiaries who leave the rolls would have to wait until early 2005, when the SER data for calendar 2003 would become available.

VI

SUPPLEMENTAL EVALUATION POSSIBILITIES

ata from the State Projects can be used in several ways to supplement the core evaluation. They can be used to (1) analyze the characteristics associated with participation in a project, (2) examine the relative effects of offering different services,

(3) refine estimates from the core evaluation by using more detailed data from the projects, (4) estimate the costs of providing the interventions, and (5) use data from some project to help address methodological issues.

These analyses can be designed, but not tested, at this time because the projects have not provided all the required data. All SPI projects except Alaska have provided reasonably complete demographic information about participants, but they are still refining their procedures for collecting and submitting important follow-up information about participants’ service receipt, employment, and participation in benefit programs. The Project Office is taking steps to improve these participant-tracking data and expects to have a preliminary set of data files ready by late spring 2003 that will cover participants who enrolled through September 2002. By the end of 2003, the Project Office should have full participant-tracking data from most State Projects and partial data from others. In addition, the SSA-funded states have agreed to supply the data files they use for their internal evaluations. These states have submitted detailed plans for their evaluations and appear to be well on the way to implementing those plans, but have not yet completed all the associated file development. Most of the State Projects are planning to submit their final evaluations at the end of 2003 or later and will therefore provide their data files at that time.

Even when it becomes available, most of the State Project data cannot be used directly in the core evaluation design for estimating net outcomes. The projects collect data only for participants and the comparison groups they have chosen for their internal evaluation. As a result, detailed data about service use and other key outcomes will be unavailable for the comparison groups selected in the core evaluation.

This chapter presents designs for the five supplemental analyses that appear to be feasible

given what is currently known about project data. It then lays out what is known about the State Project data and describes our preliminary assessment of its quality.

A. PRELIMINARY SUPPLEMENTAL EVALUATION DESIGN

At this time, there appear to be five possible supplemental analyses that would use State Project data or state administrative data that are likely to be available by late 2003.

D

100

VI. Supplemental Evaluation Possibilities

1. Analyze Service Delivery and Participation Using State Project Data

A key issue for the State Projects is the extent to which they delivered their intended interventions. State Project data on service use could be used to address this issue. In particular, those data would support analysis of the extent to which beneficiaries who enroll in a State Project receive each of the key component services the project offers (see Table I.1). It is also important to know whether any beneficiaries enroll and then receive no services.

In addition to describing the services provided, a supplemental evaluation could look at the

mix of beneficiaries who are served. This is important for understanding whether the State Project services will be useful for a broad cross-section of SSI or SSDI beneficiaries or only for specifically targeted groups. Some of the participant characteristics that can be examined using State Project data (but not SSA administrative data) include prior use of state Vocational Rehabilitation (VR) services, living arrangements, self-reported disability type, prior occupation, and educational attainment at the time of enrollment. Such an analysis could also compare each State Project’s proposed targeting criteria with the types of participants they actually enrolled. In addition, the analysis could compare the characteristics of early and later enrollees to see whether participation patterns shifted over time.

A supplemental evaluation could also combine project data with SSA administrative data to

assess the beneficiary characteristics related to participation in a State Project. The SSA database includes information about participants and all beneficiaries who were eligible to participate. Evaluators could therefore use logit models to estimate the extent to which participation is correlated with a wide array of individual characteristics and experiences. In her preliminary version of this analysis, Peikes (2002) conducted this type of analysis for the Iowa (SSA) and North Carolina projects.

Finally, it may be possible to examine whether the amount and type of services a person

received depends on any of that person’s measurable characteristics. Such an analysis would help policymakers judge the ability of the State Projects to deliver their full service packages to the various types of participants they enroll.

2. Estimate the Impact of Services Using State Project and Process Data

A second set of supplemental analyses would combine the core evaluation estimates with State Project and process analysis data to estimate the effect of offering specific services. By indicating the extent to which such differences are due to the projects’ offering different mixes of services, such an analysis would help to explain differences the various projects produce in the net outcomes. This analysis would use hierarchical linear modeling. The dependent variables would be the project-specific impact estimates generated in the core evaluation. The independent variables would be created using State Project data and would include a vector of services offered and the average total number of hours of services delivered. Separate analyses should be done for SSI and SSDI beneficiaries because their different characteristics and the differences in the two programs’ work incentives are likely to produce different net outcomes.

101

VI. Supplemental Evaluation Possibilities

With only 13 State Projects, a hierarchical linear model could examine only a few project characteristics. We recommend starting with three key characteristics: (1) whether the project offered SSI waivers, (2) whether the project offered some type of skills training in addition to benefits counseling, and (3) the average number of hours of service delivered to the project’s participants. Particularly important will be an analysis of the effects of the waivers, which is required as a condition of granting them. However, because the waivers were integrated with the other services, it will be hard to evaluate them on a project-specific basis. Nevertheless, the limited sample size available for this analysis will make it difficult to identify precisely the effects of any project component. (An alternative approach based on qualitative methods will be used as part of the synthesis of results and is described in Chapter VII.)

Another way to approach this type of analysis is to estimate the effects of groups of projects

that offer similar interventions. For example, New Mexico, New York, and Wisconsin offered services that addressed three of the four employment barriers described in Chapter I: benefits policies, service system barriers, and human capital barriers. It would be interesting to see if their more comprehensive approach was associated with greater changes in net outcomes. Other possible groupings could be based on whether a project had a Medicaid buy-in, vouchers for rehabilitation services, policy changes to make work pay, benefits counseling, employer outreach, and systems change. The final determination of any groupings must wait, however, until the Project Office and the states complete their implementation analyses. It will be particularly important that the groupings reflect the actual content and delivery of project services as well as a sense of how those services differ from what is available outside the project.

3. Refine Core Earnings Estimates Using State Unemployment Insurance Data for Key State Projects

Ten of the 12 SSA-funded State Projects plan to use quarterly earnings data from their state Unemployment Insurance (UI) systems in their internal evaluations.1 The UI data offer some important advantages for the evaluations. In most states, these data are available more quickly than earnings data from SSA’s Summary Earnings Record (SER). UI earnings data are generally available within three to six months after the end of a quarter. In contrast, SER earnings data are not available for at least 11 months after the end of a calendar year. UI data would also enable the evaluation to examine earnings patterns that are more detailed—for example, the number of consecutive quarters with earnings as a measure of the continuity of employment—which the annual SER data will not. In addition, UI data contain codes that indicate a person’s employer, and those codes could be used to investigate the stability of employment with a specific employer. UI earnings data could also help refine the participant matching process for SSDI beneficiaries. The SSA administrative data provide only calendar-year earnings, but using UI data would enable the evaluation to match on quarterly employment patterns. UI data will also be useful for measuring employment and earnings for those SSI beneficiaries who have left the

1New Hampshire and Ohio do not plan to use state UI data.

102

VI. Supplemental Evaluation Possibilities

rolls. The SSA administrative data contain monthly reports of earnings for SSI beneficiaries as long as they remain eligible.

The main disadvantages of the UI data are that they capture a slightly narrower range of jobs

than do the data recorded in the SER, and the UI data can be very expensive to process when multiple states are involved. The difference in the range of jobs reflects differences in the types of employment that are covered by State UI systems and by the federal social security taxes, which are the source of the SER data. The UI data are state-specific and typically will contain only jobs with employers located in that state. They therefore exclude the earnings of people who live in one state but work in another. In addition, the UI system typically does not cover people who are self-employed. In contrast, social security taxes are collected regardless of state of residence and are collected from people who are self-employed. The high processing costs stem from the fact that, because UI is a state-operated program, each state has developed its own data system. Thus, efforts to combine UI data from several states must resolve differences in data formats, coding, and confidentiality provisions.

Use of UI data will therefore depend on the final evaluation priorities and resources. The

best opportunity seems to be with the New York project. Staff from that project have offered to provide UI data not only for their treatment and control groups, but also for the comparison group developed by the core evaluation. Those data would enable the core evaluation to assess beneficiary matching and net outcome estimates developed using only SSA data. They would also facilitate analysis of employment continuity and methodological comparisons between the core evaluation and New York’s experiment. California, Iowa, and Vermont have also been analyzing UI data for their internal evaluations and could provide SSA with data about participants in those states. At this time, it is unclear whether those states would provide UI data for the core comparison groups selected for their states.

4. Estimating Intervention Costs and Effects on Tax Payments

Estimates of the costs of delivering services will be critical to developing policies based on the experience of the State Projects. Without such estimates, SSA will have little basis for budgeting new services or obtaining the required funding. In addition, estimates of the service costs provide a summary measure of the general intensity of the interventions delivered by the projects. Such a measure would prove useful for analyses of the impact of services.

The best available approach to developing cost estimates starts with the service use data the

State Projects supply. Those data capture use of 16 types of services. An evaluator could translate the estimates of use into cost estimates by multiplying measures of the person’s service use by the associated average cost of providing that service. The Project Office had planned to work with the State Projects to develop project-specific estimates of the average cost of each of those services as part of their implementation analysis. Those efforts are currently on hold while the quality of the underlying service data is investigated, but they could be implemented easily if problems with the service data can be resolved.

One alternative approach would be to estimate average costs by dividing the amount of a

project’s total grant award that was spent on service delivery (rather than systems change

103

VI. Supplemental Evaluation Possibilities

activities) by the number of participants in the State Project. This approach is certainly feasible, but its accuracy will depend on how well State Projects can account for the specific uses of their grant funds. In addition, several projects drew on non-grant funds to provide services. For example, many projects helped participants obtain services from the state VR system. Those services are considered an important part of the projects’ interventions and are captured in the “service used” data, but they would be excluded in estimates that used only the SPI funding.

In addition to cost information, SSA is interested in any increased tax payments made by

participants. The amount of tax contributed is a conceptually important outcome measure because a goal of the demonstration is to increase beneficiary self-sufficiency. Also, tax payments help to highlight participants’ transition from beneficiary to taxpayer.

The evaluation can use two measures of the change in tax payments. The first measure

focuses on the narrow perspective of the Social Security trust funds. It measures the extent to which increases in earnings are likely to generate corresponding increases in tax payments to these trust funds. As a rough approximation, these taxes can be estimated by multiplying estimates of the increase in earnings by the current Social Security and Medicare tax rates (6.2 percent and 1.45 percent respectively for employees and twice that for people who are self-employed). This estimate probably will overstate actual tax contributions, because not all the increased earnings will be in employment covered by these taxes. For example, some earnings in sheltered workshops are not covered. Nevertheless, this method should give SSA a general sense of the magnitude of this change in tax payments.

To estimate the change in all tax payments, the best feasible approach is to multiply the

change in participants’ total income by the estimated effective tax rate for low-income workers. This approach tries to capture not only changes in payroll and income taxes, but also changes that are borne by consumers and renters in sales tax, property tax, and corporate tax. Estimates of the change in total income will come from the core evaluation and will capture the net change in earnings and benefit payments from SSA. A starting point for obtaining an estimate of the effective tax rate for low-income workers is the research by Joseph Pechman (1985), which indicates that low-income workers tend to pay about 20 percent of their income in taxes, mostly in payroll and sales taxes. More recently, the Congressional Budget Office (1999) has generated estimates of the effective federal tax rates. That rate is about 4.6 percent for families in the lowest income quintile, but it excludes sales, payroll, and property taxes imposed by states and localities, taxes that can fall heavily on low-income groups. In the absence of a recent comprehensive estimate of the overall effective rate, it seems best to use a mix of tax-rate estimates to generate a range of plausible changes in tax payments.

5. Methodological Analysis

After the State Projects submit their internal evaluations (which are scheduled for December 2003), SSA could use its own administrative data from the core evaluation to help interpret State Project results. For example, SSA could replicate state analyses using the set of participants and comparison group members selected by a project but using SSA administrative data rather than the project’s data. This analysis can help SSA and the states understand the extent to which differences between the core and State Project findings are due to differences in methods or to

104

VI. Supplemental Evaluation Possibilities

differences in the data used. Specifically, if the findings from a replication using SSA data are consistent with those from the State Project’s own evaluation but different from those from the core, it suggests that the difference between the state and core evaluations is mostly due to differences in evaluation design rather than data source. SSA could also use core data to address sample attrition or nonresponse problems in the state evaluations (for example, states may be unable to track beneficiaries who move out of state).

B. SUMMARY OF STATE PROJECT DATA

In general, the project data currently available provide good information about the characteristics of participants, but largely incomplete information about postenrollment service use, benefit receipt, and employment. The lack of accurate service use data is especially problematic, because there is no alternative source for measuring the nature and intensity of the intervention delivered to participants. Information about participants’ receipt of SSA benefits can be obtained from the SSA administrative records, which are likely to be more accurate than the data submitted by the states. The SSA data, however, will not provide an accurate estimate of participation in other income-support programs. Similarly, SSA administrative data provide some accurate and very useful summary measures of employment. They do not, however, provide much detail about the nature of the jobs, including hours worked, wage rates, fringe benefits, occupations, and job advancement. Thus, the lack of good-quality State Project data will seriously constrain the supplemental evaluation.

The Project Office is working closely with the projects to improve data quality. It appears

that complete and accurate data will be obtained from several projects, but probably not all.

1. Overview of the Current Status of the Data Collection System

Seventeen of the 18 State Projects are collecting participant data using a set of forms that incorporate a consistent set of data definitions.2 The system requires that State Projects collect baseline data on each participant at intake as well as postenrollment data on public assistance benefits, service receipt, and employment experiences every three months after intake. State Projects submit the data to the Project Office, which then cleans and processes them. The data are stored in seven different files, some of which contain both data collected at intake and update data. Peikes et al. (2001) provide a description of how we have created preliminary analysis files from the seven source files, the data collection forms, and the operational definitions.

Collection and processing of the State Project data has been problematic. At this time, fairly

complete demographic data (collected on the intake form) are available for participants who enrolled by the end of December 2001 in all 17 State Projects that submitted data. These data measure the following participant characteristics:

2The Alaska State Project has decided not to submit individual-level data about its participants.

105

VI. Supplemental Evaluation Possibilities

��Disability type and severity

��Age

��Sex

��Living arrangements

��Social Security benefits receipt, type of benefit, and duration of benefit

��Private disability benefit receipt

��Education

��Job training

��Employment history (predisability, most recent postdisability, and at intake) All other data files remain incomplete for at least some of the projects. In particular, the

quality of the postenrollment data is extremely poor, so that we were unable to use them to test possible supplemental analyses. We reviewed preliminary postenrollment data files for six of the State Projects where the follow-up data were thought to be most complete (Peikes et al. 2001). Yet, even for those projects, the files contained information about the first three postenrollment months for only half the participants. The Project Office has worked intensively with the projects to improve these data, and there is a sense that more complete data will be available in the late spring of 2003.

In part, this reflects the problems inherent in establishing both new programs and a new

system for tracking participants. Early in the initiative, project staff concentrated on developing their programs and recruiting participants, paying less attention to their data systems. The problems also reflect a decision to give projects considerable flexibility in how they collected and reported the information. As a result, the Project Office has had to customize its data cleaning and processing procedures for each state and has had to devote considerable resources to ensuring that variables are measured consistently among the projects. Finally, the problems arise because of the difficulty projects have had tracking participants. Tracking is difficult, particularly for projects that do not offer services that provide ongoing contacts with participants. In general, the projects did not plan sufficiently for this tracking. They have taken steps to improve, however, including hiring staff who are responsible for tracking participants and collecting the follow-up data and using administrative data to provide consistent long-term follow-up information. The Project Office has worked with the State Projects to help them be more efficient and has offered both training on the tracking methods of professional survey firms and access to sophisticated tracking databases.

The problems represent a difficult trade-off faced by all demonstrations where overall

success depends on strong programs and high-quality data. SSA, like other organizations that fund demonstrations, values both these objectives, while program operators tend to place more emphasis on service delivery. There is no perfect balance, but several modifications to the SPI

106

VI. Supplemental Evaluation Possibilities

experience would help future initiatives. First, it is important to work closely with the projects from the start to establish data collection systems and ensure that sufficient resources are devoted to them. Projects need to have a clear idea of their responsibilities from the start so that they can plan appropriately. Second, a standardized system that required all states to collect and report data in the same manner would have greatly simplified and improved the overall data collection process. The decision to give projects flexibility on how they collected and reported data may have saved some of the State Projects money in the short term, but it created substantial long-term costs, as the Project Office had to work with the projects to ensure cross-project consistency and resolve idiosyncratic features of specific project databases. Third, the data system quality would have improved if the system regularly generated reports of errors and missing data. Fourth, the projects would have benefited from early training about participant tracking. It is much easier to maintain contact than to reestablish it. For example, projects would have been more effective in participant tracking if they had collected more contact information at intake (such as the name, phone number, and address of a relative who would know where the participant lived in the event they moved). Last, it would be important to pay more attention to the resources available for tracking and data collection. Overall, there appear to have been enough resources for data collection within SPI, just that they were not marshaled early enough in the process or focused efficiently.

From the start, the Project Office has worked to help the State Projects collect and report

data and then to process those data into analytical files. They continue to review the quality of the intake and postenrollment data. Preliminary assessments suggest that data quality will improve for many State Projects. By late spring 2003, the Project Office expects to have complete and clean data files covering participants and activities through September 2002. A second updated set of files will be available at the end of the project, currently scheduled for September 2003.

2. Overview of the State Project Data File Structure

The State Project data are collected from participants on five different forms.3 After processing and cleaning the data, the Project Office creates seven separate files. All files are linked by a participant identifier that can be used to combine the various categories of information and construct a comprehensive picture of a participant’s characteristics, receipt of various types of public assistance programs at and after intake, use of project and non-project services, and employment patterns following project enrollment. Employment records also contain a job identifier that can be used to link information about each job held by a participant. The final analysis file is constructed from these seven core data files:

1. Demonstration site data file contains basic information about the key contact person for the project and its geographic catchment area. Participants have a corresponding site ID that indicates their state and site so that their data can be linked to the site

3 These forms are available at www.spiconnect.org and in Khan et al. (2002).

107

VI. Supplemental Evaluation Possibilities

description data. These data are collected in the Demonstration Site Information Form.

2. Participant demographic data file stores information on demographic characteristics, type and severity of disability, receipt of Social Security benefits, and such elements as education/training, eligibility for VR services, and employment history upon enrollment into the demonstration. The variables in this file are collected on the Participant Demographic Data Form and are not altered or updated.

3. Benefits data file stores information regarding receipt of public assistance and indicates the type and amount of benefits that participants are obtaining from various government and nongovernmental sources. This file houses data collected at intake on the Participant Demographic Data Form, as well as data collected on the Participant Update Form, which is supposed to be collected and updated quarterly.

4. Services data file stores information on the types of services that participants use throughout their enrollment in the project, the hours of service received, and whether the service was funded by the project. Sixteen different types of services are tracked. This file contains data collected using the Participant Update Form, which should be updated quarterly.

5. Employment data file stores information on such job characteristics as hours, pay, and benefits, using both participant and job identifiers. State Projects are supposed to add new records to this file each time a new job is obtained (since changes in job situations alter job characteristics). They are also expected to update the employment information when they conduct the regular quarterly followups with participants. These data are collected on the Employment Data Form.

6. Placement data file stores information describing the employer and type of work performed by the participant for each job. The information stored in this file is collected whenever a new job is reported on an Employment Data Form and is not supposed to be updated.

7. Job change data file stores information pertaining to a change in the participant’s employment situation, such as changes in the number of hours worked, a different position held within a company, or a job termination. These records are updated as changes occur, based on information recorded in the Change in Employment Status Form.

With the exception of the Demonstration Site Data and the Participant Demographic Data,

the other five files all contain a variable number of records for each participant. The number of records for each participant varies according to the length of time the participant has been enrolled (early enrollees should have more updates recorded in the data system). The number of records in the placement and job change files also varies to reflect each participant’s employment experience. For example, a participant who does not work following enrollment in a State Project should have an employment data file with records for each quarter after enrollment. Each record would indicate no employment. The participant should not have any records in the placement or job change files, as there are no job details or change details to record. Finally,

108

VI. Supplemental Evaluation Possibilities

problems in collecting data on a quarterly schedule also produce variation in the number of records in each file for each participant. In addition to the variability in the number of records each participant has, there is variation in how frequently State Projects intend to update participant information. State Projects are currently using two different schedules to collect data: (1) some State Projects collect all participant data at the end of each fiscal quarter; and (2) other State Projects collect follow-up information at the end of quarters, based on individual participants’ initial date of intake into the project.

VII

IMPLEMENTATION AND SYNTHESIS ANALYSES

he final component of the evaluation is the implementation and synthesis analyses. The implementation analysis will give SSA detailed information about State Projects’ interventions and the service environment within which they operated. That information

is crucial for interpreting the quantitative findings from the core, supplemental, and State Project evaluations. Implementation information is also critical in efforts to replicate State Projects that are found to be successful or to avoid practices that prove ineffective. Finally, the four-part evaluation strategy SSA devised requires that the quantitative and qualitative information about State Projects operations and effects be synthesized. Each of the evaluation components has its own strengths and weaknesses. By combining their results within the context provided by the implementation analysis, the synthesis analysis will attempt to determine which project or project approach provides the best model for future efforts to promote employment among disabled beneficiaries.

A. IMPLEMENTATION ANALYSIS

The implementation analysis will use several sources of information. As discussed in the previous chapter, evaluators can use State Project data to develop quantitative measures of the interventions delivered to participants. These measures will help SSA understand the types and amounts of services provided. They will also provide information about the participation patterns of different types of beneficiaries, such as those with different types of disabling conditions or work histories. The State Projects and the Project Office will also collect qualitative implementation information. The Project Office summarized its plans in a document distributed to the projects at a SPI conference in March 2002 (VCU SPI Project Office 2002). Building on that document and our earlier plans for an implementation analysis (Agodini et al. 2002), we recommend that the following aspects of project implementation be documented and analyzed:

��Design. The goals and conceptual basis of the State Projects must underlie the evaluation. One component of the implementation analysis should therefore describe the reasons states undertook SPI projects. It is particularly useful to understand the service system gaps and problems that led states to develop the specific interventions they tested in SPI.

��Recruitment. An understanding of recruitment processes is essential to helping SSA interpret the enrollment levels observed in the demonstration and to plan for future employment-support initiatives. For example, did any outreach and recruitment strategies appear to be particularly effective? Did the recruitment efforts appear extensive enough to ensure that all (or even most) eligible beneficiaries would be likely to know about project services? Was recruitment sufficiently active that the

T

110

VII. Implementation and Synthesis Analyses

participation levels observed in the initiative could be considered indicative of participation in an ongoing program?

��Operations. Information about operations will help SSA understand the nature of the services provided and the processes used to deliver them. Replication efforts will be particularly interested in the types of staffing used, the staff-to-participant ratios, any special staff training, methods to promote service continuity for participants, and the types and magnitude of nonstaff resources required. Comparisons of net outcomes the projects generate can be interpreted only if the evaluation has detailed information about the specific services that were offered, including their average duration and intensity. There will be specific interest in the operations of the SSI waivers offered in four of the states (California, New York, Vermont, and Wisconsin). Furthermore, in order to understand whether there are differences in the interventions received by different enrollment cohorts, the evaluation should also document any substantial changes that occur over time in the nature or intensity of the State Project interventions.

��Context and System Change. Information about the service and policy context for each project will help SSA understand the extent to which the State Project services differ from what is available elsewhere in the demonstration and comparison communities. It is particularly useful to know whether the projects operate in areas that already have an extensive service system to support work by people with disabilities. Contextual information could also help evaluators assess whether the comparison group was contaminated by other state or local demonstrations that were implemented during the course of the project. Finally, it is important to understand the extent to which the State Projects themselves changed the service environment.

In January 2001, the 12 SSA-funded State Projects each submitted a summary of their

implementation plans and progress. By then, all of them had enrolled some participants and had begun to deliver services. In addition, many of the states had actively pursued system-change activities, which appear to have produced better communication between the various state and federal staff involved with promoting employment for people with disabilities. In some cases, those efforts have helped states obtain funding to support additional services or to alter their policies so that they can offer new employment incentives (such as Medicaid coverage on a sliding premium scale for workers with disabilities). Although the RSA-funded states have not provided such systematic reports of their activities, they have provided some other information about sample recruitment, State Project activities, and systems change efforts.

The SSA-funded states plan to provide a similar implementation information as part of their

final reports due in December 2003. We have identified the data we would like the State Projects to obtain and provide to the net outcomes evaluation. The SPI Project Office will work with the State Projects to obtain these data, and we expect that all SSA-funded projects will provide most of the information requested. The Project Office asked the RSA-funded projects for the same information and expects that all but Alaska will provide at least some. The desired implementation information is described in the following sections.

111

VII. Implementation and Synthesis Analyses

1. State Project Designs

Much of the information about the rationale underlying the design of the State Project interventions was provided in the projects’ January 2001 reports. For the final implementation report, the Project Office will ask the State Projects to review their initial submissions and update them to reflect any changes made to their designs during the second half of the initiative. The design and background information include the following items:

��Background and State Context

- What was the context and motivation for the demonstration project?

- What aspects of the existing system were targeted for change?

- What gaps in the existing service system motivated the design and implementation of the demonstration? Were there specific barriers to employment that the state sought to address? How does the state see the State Project changing the services used by disabled SSA beneficiaries?

- What were the major programmatic or economic trends that formed the backgrounds against which the interventions were designed and implemented (for example, Medicaid Buy-In programs, state executive initiatives, budget crises, consumer activities)?

��Governance

- Summarize the project’s governance structure. What state agency led the project? What other agencies were involved and to what extent?

- Describe advisory committees, work group/task force, or other types of oversight or governance groups.

- Describe the composition and functions of all governance groups.

��Targeted Outcomes

- What behaviors did the project intend to change in beneficiaries? All projects sought to improve employment and earnings. They also intended those earnings increases to reduce dependence of SSI and SSDI benefits. Were other behaviors targeted?

- What aspects of the service system did the project intend to change?

2. Project Outreach and Recruitment

To understand the characteristics of the people who participated in a State Project, one must understand the outreach and recruitment processes. Descriptive statistics provide a lot of information about the participants, but information about outreach and recruitment can help the

112

VII. Implementation and Synthesis Analyses

evaluation understand the level of motivation or service-system savvy among participants. For example, was outreach sufficiently limited that only people who were already receiving services would have been likely to hear of the project and enroll? Alternatively, was outreach sufficiently broad that most eligible beneficiaries could have been assumed to know about the project? A summary of the intake process is needed to understand the types of formal and informal eligibility criteria that states may have imposed. In particular, was project intake a simple process that would encourage entry into a project, or did it involve multiple steps, travel, or testing that might have discouraged less-motivated participants?

The specific issues that State Projects should describe include the following:

��Outreach

- What methods were used to disseminate information about the project? These ranged from meetings with potential referral agencies to direct mailings to eligible beneficiaries.

- What steps did projects take to establish their credibility? Potential participants may see a new project as being unable to deliver sustained interventions or may misunderstand its goals and potential benefits. Thus, it is important for new programs to establish themselves among beneficiaries and the information and referral sources they trust.

��Participant Recruitment and Enrollment

- Who was eligible to participate?

- Did eligibility criteria change over time? If so, when? In what ways? Why?

- What were the primary sources of project referrals (for example, beneficiary self-referral, referral from VR, One-Stops, mental health centers, or centers for independent living)? Did these vary by disability type?

��Intake and Enrollment Procedures

- Where was intake located? How accessible was that location?

- How long did the intake process take?

- How easy was it for participants to complete the consent form required for SPI participation? Did many potential applicants decline to give consent?

- Did the project collect additional information at intake beyond what was on the SPI intake form? If so, what was this information and how was it used?

113

VII. Implementation and Synthesis Analyses

3. State Project Interventions

State Projects should provide detailed information about the interventions they delivered or arranged for participants. Much of this information was provided in the earlier implementation reports, but projects should update it and describe in detail any new interventions. Projects should pay specific attention to any substantive changes in the nature or intensity of their interventions. For example, about two years into the initiative, four states received authority to offer their participants waivers from some SSI rules. These waivers are expected to decrease the work disincentives inherent in current SSI rules; one waiver allows beneficiaries with earnings to keep almost twice as much of their benefits as do current rules.

The information describing the interventions should include the following items:

��Major Intervention Strategies

- Describe the major intervention strategies used. These include benefits planning and assistance, service delivery from within One-Stop Centers, direct employment services and employment supports, vouchers, peer supports, Medicaid buy-in, waivers.

- What fraction of participants received each intervention? How were the interventions linked? What was the average length of participation in each intervention, and what was the average intensity (for example, how many staff hours were spent delivering an intervention to those who received it)?

- For the four states with SSI waivers, when did the waivers become operational in your state? How were they implemented? What was done to ensure full implementation? What data are available about people’s use of the waivers?

��Facilities and Resources

- What facilities and resources did the project use to implement the demonstration?

- Were the resources created, developed, borrowed, or modified?

- What computers were used to provide services?

- To what extent were facilities and resources financed by the SPI demonstration or other agencies?

��Staffing

- Describe the number and background of project staff.

- Describe training activities of the demonstration, focusing on who is trained, the type of training (in-service, on-line/distance instruction, certificate programs, etc.), and the amount of training provided (hours, days, etc.).

114

VII. Implementation and Synthesis Analyses

- As appropriate, describe training to people not employed by the project, and the extent to which technical assistance is available to staff.

��Services

- What services did project staff deliver directly to participants?

- Did staff refer participants to other services? Which services?

- What is a typical profile of contacts with participants and services provided to them? How did project staff plan service activities?

- Where were project staff located?

- How much time, on average, does the project spend with each participant?

��Cost

- What percentage of project resources were devoted to each intervention component identified in the overall intervention strategy (for example, benefits counseling or training)?

- What is the average cost of delivering each intervention component?

��Geographic Uniformity

- To what extent was the project’s intervention delivered consistently in all project sites? State Project catchment areas ranged from as small as a few towns in New Hampshire to statewide in several states. Do project administrators feel that the same intervention was delivered throughout their catchment area, or were there substantial variations?

��Data Collection

- What data sources were used to complete each of the data items provided by the State Projects? Several projects collected some information directly from participants and other information from administrative data sources. Understanding the source of each data item for each project is important for interpreting accurately any observed differences among projects.

- Did the data sources or data quality vary over time or among project sites?

- What aspects of the SPI data collection process worked well or not so well? This information will help SSA design data collection processes for future initiatives or demonstrations.

4. Project Environment

A State Project’s environment affects operations in several ways. Most important, a project’s service environment includes the services participants could have used in the absence

115

VII. Implementation and Synthesis Analyses

of the State Project. It therefore determines the basic policy comparison tested in the initiative: the difference between the status quo service system and a system that includes the State Project services. In addition, projects may rely on other agencies or providers to deliver key services to SPI participants. For example, several projects required participants to be enrolled with the state vocational rehabilitation agency and relied on that agency to provide skills training and other employment-promotion services. Finally, the nature of the local economy affects the employment opportunities for participants and therefore the net outcomes a project is likely to produce. Aspects of the environment that projects should describe include the following:

��Where else could project participants obtain employment support services? How easy would it be for the participant to obtain such services? How did those services differ from what the project offered?

��To what extent did the employment and service environment vary among the areas in which the project offered services? Were any of these differences large enough to change the nature of the intervention or its intended net outcomes?

��Did the project rely on outside agencies or providers to deliver services to participants? In particular, did a project have any formal or informal agreements with other providers to deliver services that supplemented those the project provided directly?

��Were there any other major programs that were operating in the project’s catchment area demonstrations and that offered employment support services for people with disabilities? What services did they provide? In particular, how did the State Project interact with the Ticket to Work, Benefits Planning and Outreach Assistance, Medicaid Infrastructure Grant, and Workforce Improvement Grant initiatives?

��Did the Medicaid program or other programs used by participants change substantially during the project? In particular, was a Medicaid Buy-In program implemented as SPI services were being delivered?

��What was the general state of the local economy during the project’s operation? Were there any major events that might have substantially altered employment opportunities (did a manufacturer move into the area, a prolonged strike take place, a plant close)?

5. General Assessment of Operational Success

Finally, it is useful to have the projects’ own assessments of their operations. Some projects have already indicated that they wish they had offered a broader array of services or otherwise changed some aspects of their operations. These assessments are invaluable for replication efforts and for assessing whether observed net outcomes would be different in an ongoing program that had time to refine its operations. Specifically, it would be helpful to have projects’ views on the following questions:

��Did the project successfully address the goals initially established for it?

116

VII. Implementation and Synthesis Analyses

��Would project staff change aspects of their operations? If so, what would they change and why? What do they know now that they wish they had known when they started their project?

��What features appeared to be associated with the success or failure of the program? If program features were not implemented as planned, how important was the change to the success or failure of the program?

��What would the State Project recommend to another state interested in promoting employment for people with disabilities based on their experiences with SPI?

6. State Project Systems Change Activities

In addition to interventions targeted to individuals, the State Projects sought to change overall aspects of the systems that try to promote or support employment for people with disabilities. Systems change activities sought to improve coordination among organizations and to eliminate major obstacles to employment for all people with disabilities in a demonstration area, not just participants. Examples include employer liaison activities, interagency collaboration strategies, or information dissemination. The final implementation analysis should therefore ask about the following aspects of the projects’ systems change efforts:

��What systems change activities did the State Project undertake? What was the timeline for systems change activities and any observed system changes?

��What state and local activities have changed as a result of the project’s system change efforts?

��Did there exist in the state any other efforts that facilitated or inhibited the project’s efforts?

��If any interagency collaboration has been targeted, please describe the participating agencies, any agreements or results of the collaboration, and any agencies that might participate in the future. To what extent have these efforts resulted in SSA staff becoming more routinely involved in planning and coordination meetings among state organizations that seek to promote employment among people with disabilities?

��Have the systems change efforts resulted in more effective information sharing among organizations that help disabled SSA beneficiaries find and maintain employment? Such efforts may help training and job placement organizations better understand SSI and SSDI work incentives and policies. They may also increase data sharing, such as by providing benefits counselors with information about a beneficiary’s participation in a wide range of income support programs.

��Did the SPI activities lead to or support other grant acquisition by the states, particularly with respect to solicitations from SSA? If so, what were the specific acquisitions, and what success did the state have? The evaluation should pay special attention to the

117

VII. Implementation and Synthesis Analyses

extent that SPI helped states obtain Medicaid infrastructure grants, a Benefits Planning, Assistance, and Outreach contract during the first round of awards, and a DOL Workforce Improvement Grant.

��To what extent have State Project activities improved or assisted SSA operations. For example, do benefits counselors routinely assist SSA field office staff in monitoring beneficiary employment and thereby help reduce overpayments?

B. SYNTHESIS OF THE EVALUATION FINDINGS

Final judgments about SPI will be based on a synthesis of the major evaluation components: the Project Office process analysis, the State Project internal evaluations, the final core net outcomes evaluation, and any supplemental evaluation. The synthesis must reconcile the quantitative estimates, particularly those from the State Project and core net outcomes evaluations. It must also use the qualitative information to help explain the cross-project differences in net outcomes. Its goal should be to develop a consistent story about the initiative and its findings so that the findings can inform, rather than confuse, policymakers.

In reconciling the various quantitative estimates, the synthesis effort should focus on three

factors. First, results are likely to differ because of different methodological biases inherent in the various evaluation designs the projects and core evaluation use. Second, the data used in the various evaluations differ in their accuracy, time period covered, and samples. Third, nonresponse and attrition are likely to differ among estimates and may produce different results. To reconcile these factors, the synthesis should prepare a table that summarizes key estimates. It can then assess the most likely causes of differences and prepare a synopsis of the differences and their implications for interpreting the findings. This effort will draw extensively on the technical assistance the Project Office will provide to the states.

In some cases, it will be easy to identify the best set of estimates, because one set will be

distinguished by the accuracy of its evaluation design and data. In the other cases, where no single estimate dominates on methodological grounds, it will be harder to identify the best conclusion. Under these circumstances, the most favorable situation is one in which all the estimates provide the same qualitative conclusion, which would suggest that the findings are robust with respect to estimation methods and should give policymakers some confidence. A less favorable situation comes about when the results disagree. If this happens, the evaluation should explore the extent to which any differences in the data used to produce each set of estimates explain the differences in results. For example, if a net outcome estimate produced by the core component differs from the one produced by a state’s internal evaluation, an evaluator could use SSA data to produce alternative net outcome estimates with the state’s internal evaluation design. If the core and alternative net outcome estimates are similar, it would provide strong evidence that differences in data explain the difference between the core and original net outcome estimates produced by a state’s internal evaluation. If the core and alternative net outcome estimates are different, further analyses will be required to understand the differences and their implications for the results.

118

VII. Implementation and Synthesis Analyses

We suspect that the greatest cause of variation among estimates will be differences in the implicit comparisons made. In particular, it appears that the underlying status quo with which the State Projects are being compared will differ substantially across states. Even when two states field similar interventions, they may produce different impacts because the comparison environments in the two states differ. For example, comparison group members in one state may have better access to vocational supports or benefits counseling than comparison group members in another state. The Project Office implementation analysis should help to describe these comparisons and thus provide a basis for understanding the differences among projects.

The synthesis effort will draw on the effort to reconcile the net outcome estimates. It will

compile information from the four evaluation components and try to assess which characteristics of State Projects and participants are most closely correlated with positive net outcomes. Its overall goal will be to examine SSA’s four policy issues:

1. The relative effectiveness of the individual State Projects

2. The effect of participant and local area characteristics on project effectiveness

3. The common elements of successful project designs

4. The lessons that were learned from the State Projects and that would be useful to other states and localities

The synthesis analysis will rely primarily on qualitative analysis methods. Although quantitative methods such as regression models for nested (or hierarchical) data and classification algorithms (such as discriminant analysis) can be used to identify demonstration characteristics or program components that are associated with successful outcomes, they are limited by the small number of projects in SPI and the large number of factors that could explain differential outcomes for the various State Projects.

The qualitative analysis will contrast program impacts with measures of local environment

and program operations. Decker and Thornton (1996) used this process to identify aspects of successful training programs for SSI beneficiaries with mental retardation. This method would develop a table that lists states according to the magnitude of their project’s estimated net effect on earnings. We recommend that the synthesis focus on earnings effects, because they underlie changes in benefits, program participation, and, ultimately, self-sufficiency. Nevertheless, it will be useful also to track the variation in the other major outcomes: employment rates, SSI/SSDI participation and benefit receipt, and total income.

In addition to the net outcome estimates, the table would list key aspects of the labor market

and local service environment—aspects that are expected to have a strong influence on increases in net earnings. Evaluators can obtain information about these factors from Census and other published statistics and from the Project Office’s analysis of implementation. At this time, we expect the set of factors to include the following types of measures (data on most of these items were used in the comparison area selection process described in Chapter II and so are easily available to the synthesis analysis):

119

VII. Implementation and Synthesis Analyses

��Population density in the demonstration area

��County-level unemployment rate and recent trends in unemployment rates

��The extent of farming and manufacturing employment

��Per capita community mental health spending

��Extent of public transportation

The table should also include key program characteristics that reflect the nature and type of interventions being delivered. Evaluators can obtain data about these factors from the Project Office implementation analysis and from the participation data provided by the State Projects. Measures are likely to include:

��Types of intervention (using the dimensions shown in Table I.1)

��Whether the intervention is statewide

��Number and types of employment barriers addressed by the project

��Cost per participant or other measure of service intensity

��Average length of participation among participants

��Fraction of participants who obtain a job while receiving State Project services

��Number of clients served by a State Project

Last, the table should describe the participant characteristics for each State Project. The projects have targeted different groups of beneficiaries and are using different types of recruitment strategies. The key characteristics should include the following:

��Types of disabling conditions

��Fraction of participants employed at enrollment

��Fraction served by the VR system at enrollment

��Demographics

��The mix of SSI and SSDI beneficiaries

��Length of time participants have received SSI or SSDI benefits prior to enrollment

120

VII. Implementation and Synthesis Analyses

The final selection of factors should be based on findings of the process analyses conducted by the Project Office and the individual State Project evaluations, as well as on discussions with SSA staff.

Once the table of outcomes and influencing factors has been prepared, the evaluator should

look among the high-performing projects to see if they tended to differ from those in the low-performing projects. In addition, they should look to see whether similar State Projects produce similar net outcomes. Finally, to determine whether particular groups consistently do better than others, the evaluation should examine estimates for participant subgroups defined by demographics, work history, and disabling condition.

REFERENCES

Agodini, Roberto, Craig Thornton, and Nazmul Khan. “Design for Estimating the Net Outcomes of the State Partnership Initiatives: Preliminary Final Report.” Princeton, NJ: Mathematica Policy Research, August 2002.

Agodini, Roberto, and Mark Dynarski. “Are Experiments the Only Option? A Look at Dropout Prevention Programs.” Princeton, NJ: Mathematica Policy Research, August 2001.

Agodini, Roberto, Kate Bartkus, Vinita Jethwani, Theresa Kim, Nora Paxton, Deborah Peikes, Rachel Sullivan, and Craig Thornton. “Initial Assessment of SSA Administrative Data for Use in the Net Outcomes Evaluation of the State Partnership Initiative.” Princeton, NJ: Mathematica Policy Research, Inc., 2001.

Ashenfelter, Orley. “The Case for Evaluating Training Programs with Randomized Trials.” Economics of Education Review, vol. 6, no. 4, 1987, pp. 333-338.

Ashenfelter, Orley. “Estimating the Effect of Training Programs on Earnings.” Review of Economics and Statistics, vol. 60, 1978, pp. 47-57.

Congressional Budget Office. “Preliminary Estimates of Effective Tax Rates.” Washington, DC: CBO, September 7, 1999.

Decker, Paul T., and Craig Thornton. “Long-Term Effects of Transitional Employment Services.” Social Security Bulletin, vol. 58, no. 4, winter 1995, pp. 71-81.

Dehejia, Rajeev H., and Sadek Wahba. “Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs.” Journal of the American Statistical Association, vol. 94, no. 448, 1999, pp. 1053-1062.

Dehejia, Rajeev H., and Sadek Wahba. “Propensity Score Matching Methods for Non-Experimental Causal Studies.” Working paper 6829, Cambridge, MA: National Bureau of Economic Research, December 1998.

Heckman, James J., and V. Joseph Hotz. “Choosing Among Alternative Nonexperimental Methods for Estimating the Impact of Social Programs: The Case of Manpower Training.” Journal of the American Statistical Association, vol. 84, no. 408, 1989, pp. 862-880.

Heckman, James J., Hidehiko Ichimura, Jeffrey Smith, and Petra Todd. “Characterizing Selection Bias Using Experimental Data.” Econometrica, vol. 66, no. 5, 1998, pp. 1017-1098.

Jethwani, Vinita, Debbie Peikes, Nora Paxton, and Kate Bartkus. “Characteristics of State Project Participants Enrolled Through December 31, 2001.” Princeton, NJ: Mathematica Policy Research, Inc., August 2002.

122

References

Khan, Nazmul, Nora Paxton, Kate Bartkus, Roberto Agodini, and Craig Thornton. “File Construction and Data Processing for Estimating the Net Outcomes of the State Partnership Initiatives.” Princeton, NJ: Mathematica Policy Research, September 2002.

Kornfeld, Robert J., Michelle L. Wood, Larry L. Orr, and David A. Long. Impacts of the Project NetWork Demonstration: Final Report. Washington, DC: Abt Associates Inc., March 1999.

LaLonde, Robert. “Evaluating the Econometric Evaluations of Training Programs with Experimental Data.” American Economic Review, vol. 76, no. 4, 1986, pp. 604-620.

Mardia, K.V., J.T. Kent, and J.M. Bibby. Multivariate Analysis. London: Academic Press, 1979.

Orr, Larry L. Social Experiments: Evaluating Public Programs with Experimental Methods. Thousand Oaks, CA: Sage, 1999.

Pechman, Joseph A. Who Paid the Taxes, 1966-85? Washington, DC: The Brookings Institution, 1985.

Peikes, Deborah. “Participation in the Iowa and North Carolina State Partnership Initiative Projects.” Presentation to the Fourth State Partnership Systems Change Initiative Annual Meeting, Washington, DC, August 1-2, 2002.

Peikes, Deborah, Vinita Jethwani, and Kate Bartkus. “State Partnership Initiative: Participant Characteristics and Early Experiences.” Princeton, NJ: Mathematica Policy Research, December 2001.

Pickett, Clark. SSI Disabled Recipients Who Work: December 2000. Baltimore, MD: Social Security Administration, 2001.

Prero, Aaron J., and Craig Thornton. “Transitional Employment Training for SSI Recipients with Mental Retardation.” Social Security Bulletin, vol. 54, no. 11, 1991, pp. 2-23.

Rosebaum, Paul R., and Donald B. Rubin. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika, vol. 70, 1983, pp. 41-55.

Rubin, Donald B., and Neal Thomas. “Characterizing the Effect of Matching Using Linear Propensity Score Methods with Normal Distributions.” Biometrika, vol. 79, no. 4, 1992, pp. 797-809.

Rubin, Donald B. “Matching to Remove Bias in Observational Studies.” Biometrics, vol. 29, 1973, pp. 159-183.

Rubin, Donald B., and Neal Thomas. “Combining Propensity Score Matching with Additional Adjustments for Prognostic Covariates.” Journal of the American Statistical Association, vol. 95, no. 450, 2000, pp. 573-585.

123

References

Thornton, Craig, Roberto Agodini, and Vinita Jethwani. “Design for Evaluating the Net Outcomes for the State Partnership Initiative.” Princeton, NJ: Mathematica Policy Research, Inc., October 2000.

U.S. Bureau of the Census. County and City Data Book—1994. Washington, DC: U.S. Government Printing Office, May 1995.

_______________. “Estmod95 (estimates of poverty and income for 1995).” 2000a. [www.census.gov/housing/saipe/estmod95/] and [www.census.gov/hhes/www/saipe/ stcty/sc95ftpdoc.html] Accessed August 18, 2000.

_______________. “County Population Estimates for July 1, 1999 and Population Change for April 1, 1990 to July 1, 1999.” 2000b. [www.census.gov/population/www/ estimates/co_99_2.html] Accessed September 11, 2000.

_______________. “Land Area, Population, and Density for States and Counties: 1990.” 2000c. [www.census.gov/population/censusdata/90den_stco.txt] Accessed September 5, 2000.

_______________. “State and County QuickFacts 2000.” [http://Quickfacts.census.gov/qfd/] Accessed March 2002.

_______________. “Economic Census, Manufacturing: Geographic Area Series: Industry Statistics for the States, Metropolitan Areas, Counties, and Places – 1997.” 2002b. [http://factfinder.census.gov/servlet/EconSectorServlet?_SectorId=31&survey=Economic%20Census&_ts=53341617080] Accessed January 2002.

U.S. Department of Agriculture, National Agricultural Statistics Services [www.census.gov/ population/censusdata/90den_stco.txt] Accessed September 5, 2000.

_______________. National Agricultural Statistics Service, [www.nass.gov/census/ census97/highlights/ST/ST.htm] (where ST is the postal abbreviation of the state for which data is being obtained). Accessed September 1, 2000.

U.S. Department of Labor, Bureau of Labor Statistics. “State and County Employment and Wages from Covered Employment and Wages, 1997-2000.” 2001. [www.bls.gov/ cew/home.htm] Accessed January 2001.

_______________. “Local Area Unemployment Statistics: Geographic Concepts.” Washington, DC: BLS, 1996 [http://stats.bls.gov/laugeo.htm] Accessed October 13, 2000. Note: the site has since moved to http://www.bls.gov/lau/laugeo.htm

_______________. “Local Area Unemployment Statistics.” 2000b. [ftp://ftp.bls.gov/ pub/time.series/la] Accessed October 9, 2000.

_______________. BLS Handbook of Methods. Bulletin 2490. April 1997.

VCU SPI Project Office. “Overview of SPI Process Evaluation Activities.” National Workforce Inclusion Conference, Los Angeles, CA, March 11, 2002.

TABLE OF ACRONYMS

BLS U.S. Department of Labor, Bureau of Labor Statistics CBO Congressional Budget Office CMS Centers for Medicare and Medicaid Services D-in-D Method of Difference-in-Differences DOL U.S. Department of Labor ENP Eligible Non-Participants IRWE Impairment Related Work Experience LMA Labor Market Area MBR Master Beneficiary Record (basic SSDI administrative file) MDD Minimum Detectable Differences MEF Master Earnings File MPR Mathematica Policy Research, Inc. PASS Plan for Achieving Self Sufficiency REMICS Revised Management Information Counts System RSA Rehabilitation Services Administration SAMHSA Substance Abuse and Mental Health Services Administration SAS Statistical programming package produced by SAS Institute SER Summary Earnings Record SGA Substantial Gainful Activity SPI State Partnership Initiative (this initiative is also referred to as the State

Partnership and Systems Change Initiative) SSA Social Security Administration

126__________________________________________________________________________

_____________________________________________________________________________ Acronyms

SSDI Social Security Disability Insurance (Title II of the Social Security Act) SSI Supplemental Security Income (Title XVI of the Social Security Act) SSN Social Security number SSR Supplemental Security Record TANF Temporary Assistance for Needy Families TETD Transitional Employment Training Demonstration UI Unemployment Insurance system VCU Virginia Commonwealth University VR Vocational Rehabilitation System