Predicting successful and unsuccessful transitions from school to work by using sequence methods

34
WORKING PAPER SERIES NO. 55 PREDICTING SUCCESSFUL AND UNSUCCESSFUL TRANSITIONS FROM SCHOOL TO WORK USING SEQUENCE METHODS DUNCAN McVICAR and MICHAEL ANYADIKE-DANES NORTHERN IRELAND ECONOMIC RESEARCH CENTRE

Transcript of Predicting successful and unsuccessful transitions from school to work by using sequence methods

WORKING PAPER SERIES

NO. 55

PREDICTING SUCCESSFUL AND UNSUCCESSFUL TRANSITIONSFROM SCHOOL TO WORK USING SEQUENCE METHODS

DUNCAN McVICAR and MICHAEL ANYADIKE-DANES

NORTHERN IRELAND

ECONOMIC RESEARCH CENTRE

Predicting Successful and Unsuccessful Transitions

from School to Work Using Sequence Methods

August 2000

Duncan McVicar and Michael Anyadike-Danes

Northern Ireland Economic Research Centre46-48 University Road, Belfast BT7 1NJ, United Kingdom

Tel: 44 (0)28 9026 1800Fax: 44 (0)28 9043 9435

Email: [email protected]: [email protected]

We are grateful to the Training and Employment Agency for supporting the collectionof the original survey data which this paper is based. The research is part of NIERC’sHuman Resources and Economic Development research programme supported by theDepartment of Enterprise, Trade and Investment, and the Department of Finance andPersonnel. All views are those of the authors.

Abstract

Policy makers recognise the importance of early identification of young people that

are likely to end up jobless on entry to the adult labour market. This paper uses

sequencing techniques to characterise 712 young people’s transitions from school to

work into ‘types’, with jobless types interpreted as unsuccessful transitions. A logit

model is estimated for transition type using a collection of static individual, family

and school characteristics. This allows us to identify which young people are most

likely to experience unsuccessful transitions into the adult labour market. Policy

makers might use such information to target social and educational policy more

effectively to promote social inclusion.

Young people with the following characteristics at age 16 (in order of importance) are

more likely to experience an unsuccessful transition into the adult labour market:

! Poor qualifications;

! Coming from a disadvantaged area of Northern Ireland;

! Having an unemployed father;

! Coming from a single parent family;

! Being female, and

! Being Catholic.

1

1. Introduction

Policy makers have long recognised the importance of ‘catching problems early’ if

they are to be effectively dealt with in the most efficient way. This belief, that

‘prevention’ is better than ‘cure’, now pervades much of social policy in the US and

the UK. In particular, governments now believe that if they can catch the future long-

term jobless early enough with targeted interventions then they may be able to prevent

a slide into social exclusion. In this paper we identify a typology of transitions from

school to work, one of which is essentially a transition into long-term joblessness, and

show how young people’s background characteristics influence their chances of

experiencing a particular type of transition. Such findings might potentially be used

by policy makers to help target early interventions more effectively.

Our data consist of a vector of static background characteristics and a time series

sequence of 72 monthly labour market activities for each of 712 individuals in a

cohort survey. The sequences follow each young person from the month they are first

eligible to leave compulsory education (July 1993) for a further six years. Our

problem is how best to use this data to identify ‘at risk’ young people at age 16 and to

characterise their post-school career trajectories. One possibility is to construct a

dynamic stochastic model (e.g. a Markov model) where activity in one month is some

function of activity in the previous month coupled with the static characteristic

variables.1 The approach adopted here takes a different route, where the raw

sequences are analysed using optimal matching and cluster analysis to enable

classification into a simple typology of transition patterns.

Our motivation for the adoption of these methods is threefold. First we feel there is

worth in introducing these sequence methods to an audience that may not routinely

come across them in the economics literature, despite their role in other scientific

disciplines. Second, these sequence methods are ideally suited to the problem of

reducing the dimensionality of our large database down to manageable and readily

interpretable dimensions. Third, our data are ideally suited to these sequence methods

1 This approach is adopted in a companion paper with the same data set and for similar purposes (seeAnyadike-Danes and McVicar, 2000b).

2

and provide an application which goes further, in terms of successfully estimating a

regression model with sequence ‘types’ on the left hand side, than has previously been

possible.

The dependent variable in our regression analysis is transition ‘type’ – a fivefold

categorisation of economic and educational activity over the six years immediately

following completion of compulsory education (i.e. from age 16 to age 22). This is

discussed in Section 5. The classification of transition types is based on a cluster

analysis of the data and is discussed in Section 4. The cluster analysis in turn is based

on a distance matrix computed using optimal matching techniques (developed in

genetics to compare complex protein sequences). This is discussed in Section 3.

Finally, Section 6 discusses the sensitivity of the results to changes in assumptions at

the various stages, and Section 7 concludes. The following section introduces the data

used in our application.

2. The Data and the Context

Our data are taken from the 1999 sweep of the Status Zero Survey (see McVicar et

al., 2000). The survey was first carried out, by means of face-to-face interviews, in

June 1995 with a sample of 980 young people from Northern Ireland that had

completed compulsory education two years previously.2 For each sample member

monthly activity (e.g. at school, at further education college (FE), in training,

employment or jobless) was collected for the intervening two years along with a

considerable amount of background information on qualifications levels, parental

employment status etc. In the June 1999 follow-up sweep, this background data and

monthly activity information was updated to cover the first six years following

completion of compulsory education, or the years from age 16-22 (and higher

education (HE) was added as an activity category to reflect this older age group). The

sample size was 712 in this most recent sweep and the panel is fully balanced.

Appropriate weights adjust for response bias.

2 There are around 24,000 young people in this cohort in total.

3

School is compulsory in Northern Ireland until age 16 at which time (at the end of the

academic year) young people are faced with a number of choices.

! Around a half of the 1993 cohort stayed on at school either for one year (to retake

previously failed examinations or one-year vocational courses) or for two years (to

take an academic qualification (A-Level) with a small proportion taking a vocational

qualification (GNVQ)). Some young people, by combining one-year and two-year

courses, or by repeating a year, stayed on for three years. School generally leads on to

higher education (HE) or employment directly.

! Around 20% of the cohort entered FE at 16, taking a similar mix of courses

although more weighted towards the vocational end of the spectrum. This is also a

stepping stone to HE but many FE graduates enter employment directly.

! A further 20% of the cohort entered government sponsored training schemes often

based with an employer and involving work experience plus study towards a

vocational qualification. This tends to lead to employment directly.

! The remaining 10% of the cohort entered employment directly at 16 or

joblessness.

Figure 1 shows the pattern of the cohort’s economic activities from the October

following completion of compulsory education (age 16) to age 22. Most young people

are in some form of full time education or training for the first two years. Most end up

in employment by age 22, although many are still in HE and a significant proportion

are jobless. There are three discrete jumps in the proportions engaged in the various

activities, corresponding with the summers of 12, 24 and 36 months after completing

5th Form. Firstly, at age 17, there is a discrete fall in numbers attending school and a

corresponding rise in numbers in employment. This is repeated at age 18,

accompanied by a fall in numbers in training and a proportion entering HE. Finally, at

19, school numbers drop to zero, FE and training display a significant discrete fall and

HE, employment and joblessness display discrete rises (the latter slightly delayed).

For the remainder of the sample period, numbers in FE, HE and training fall, whilst

joblessness and employment rise.

4

In order to create a typology of youth transitions, we need a method of comparing the

similarities between the 712 sequences of 72 monthly activity variables that lie behind

the aggregate proportions shown in Figure 1. In the following section we argue that

optimal matching analysis provides a suitable method for this purpose.

3. Optimal Matching (OM)

Consider the following two sequences of activities (for this example we have

simplified the 72 monthly activities into six yearly activities).

A: School School HE HE HE Employment

B: FE FE FE HE HE HE

We want to classify sequences into groups that are in some sense similar. But how do

we measure the difference or distance between them? OM is a method that measures

distances between sequences such as those above by asking the question ‘how could

we turn one sequence into the other with the least possible cost?’

OM methods, first appearing in the 1970s, were developed for the analysis of protein

and DNA sequences (Abbott and Tsay, 1999). The ‘distance’ measure between

sequences that is the output of OM analysis is a measure of the minimum combination

of replacements (substitutions), insertions and deletions (indels) required to transform

one sequence into another. The development of the techniques is reviewed in Sankoff

and Kruskall (1983). Abbott and Forrest (1986) first applied OM techniques to social

science questions in a study of figure sequences in folk dancing. A number of social

science applications have followed (e.g. Abbott and Forrest, 1990; Halpin and Chan,

1999). Analyses of career paths have been among the most numerous of these

applications (e.g. Abbott and Hrycak, 1990; Halpin and Chan, 1999), although we are

not aware of any study based on similar cohort data for young people.

The first step in the application of OM techniques is to specify a cost matrix for

substitutions and indels. The specification of these cost matrices tends to be somewhat

5

ad hoc in the literature, although often guided by some simple rules.3 Indeed, it is this

specification stage that has been seen as one of the main weaknesses of OM

techniques.4

We have six alternative activities for each individual-month. We adopt a largely ad

hoc cost matrix based on routes to different levels of employment in later life.

Employment is divided into professional, skilled non-professional, semi-skilled and

unskilled/marginal. These employment types require different levels of education

(level 4 or above for professional, 3 for skilled, 2 for semi-skilled and below 2 for

unskilled/marginal). Substitution costs between activities can be based on the broad

rule that switching activities within an employment group should be less than

substitution costs for switching activities that move individuals between groups. For

example, level 4 education requires HE, which is fed almost entirely by school sixth

forms and FE colleges. Therefore we cost switches between these three states cheaply,

so that an individual at school for two years and then HE is categorised as similar to

an individual at FE for two years and then HE. On the other hand, Level 2

qualifications are far more common outcomes for those young people that have been

in training schemes (see Armstrong and McVicar, 1999, for details). Therefore we

place a high cost on switching between school, HE and training. Switches to and from

employment are cheap, with the exception of switches between employment and

joblessness, to reflect the incidence of temporary jobs for students, years out and so

on.

Table 1 shows one possible upper triangular substitution cost matrix reflecting this set

of assumptions. Although not the only possibility, it does have the property of giving

smaller within-group distances than between-group distances for all but a few outliers.

Indel costs are set at 1.5 (low). This is referred to in what follows as the standard cost

matrix. Identical sequences have a distance coefficient of 0. In the two sequence

example above, the cheapest way of changing sequence B into sequence A is by

substituting school for FE in the first two years, HE for FE in the third year and

3 For example, Abbott and Forrest (1986) have substitution costs reflecting numbers of physical stepsthat are different between two figures of a dance.4 Stovel et al. (1996) remark that “the assignment of transformation costs haunts all optimal matchinganalyses.” [Quoted in Abbott and Tsay, 1999].

6

employment for HE in the last year. From Table 1 the total cost of this substitution is

4. This is then standardised so that the most distant sequences have a distance

coefficient of 1. Since the maximum inter-sequence distance would be displayed by

six years of joblessness compared to three years of school and then three years HE, a

total substitution cost or indel cost of 18, the normalised distance coefficient between

sequences A and B above would be 4/18 = 0.22.

Section 6 discusses the sensitivity of our results to adoption of alternative cost

matrices – the standard cost matrix but with high indel costs (indel = 4) and a unit cost

matrix where all substitutions cost 1 (and indel = 4).5 The various cost matrices are

input to the OPTIMIZE program which then runs the OM analysis (for details of this

software see www.svc.uchicago.edu/users/Abbot/optfdoc.htm). The output is a

21 *712*712 matrix of standardised (and minimised) distances between each sequence.

This distance matrix then forms the input to a cluster analysis as discussed in the

following section.

We could carry out cluster analysis of the sequence data directly – without going

through the OM stage – using some other form of inter-sequence distance. In an

earlier paper (Anyadike-Danes and McVicar, 2000a) we do this with a distance

measure based on correlation of six binary state variables for each month (e.g. FE = 1

if individual i in month m is in FE and 0 otherwise). There are several advantages to

the OM based method however. Firstly, the cluster analysis based on the OM is

computationally simpler. More importantly, the cluster analysis performed on the data

directly cannot account for two sequences that may be very similar but not perfectly

temporally aligned. Consider two career sequences, with ‘E’ representing a year in

employment and ‘U’ a year in unemployment: EUEUEU and UEUEUE. Cluster

analysis applied to the data directly will treat these two sequences as maximally

different. The OM analysis, however, by indel of a single term, will treat these

sequences as quite similar. Another attraction of the OM based cluster analysis is its

versatility, in that assumptions (e.g. of the substitution costs between different

5 This unit cost matrix is adopted in previous applications where there is no clear reason for specifyingcosts otherwise (e.g. Dijkstra and Taris, 1995). In our two sequence example above the unit cost matrixwould also yield a cost of 4 of turning B into A, but the normalisation factor (the maximum distance)would be 6, giving a normalised distance of .67 (4/6).

7

activities) can be easily altered at the input stage in a way that cluster analysis alone

does not allow.

4. Cluster Analysis

Cluster analysis is used to create homogeneous groups of cases (or in our example,

sequences) from large samples. Cases are grouped according to some distance

measure between them. This allows complex information to be synthesised into a

small number of clusters with similar patterns, or in our case similar career

trajectories. In other words it offers a method for creating a simple typology of the

transition from school to work, without necessarily discarding any of the information

contained within the data and without imposing any stochastic restrictions ex ante.

Hannan and Doyle (2000) provide a recent application of cluster analysis to career

trajectory data of young Irish people in transition from school to work. They reduce

the monthly categorical data for each individual to six-monthly figures that measure

the proportions of the six-month period that the individual spends in each of the

different activities. This has the effect of turning categorical variables (e.g.

1=education, 2=employment etc.) into pseudo-continuous variables (e.g. number of

months in education in last six months) for which more standard distance measures

can be computed and clusters identified.6

Cluster analysis has a number of well-known weaknesses (for a recent discussion see

Morgan and Roy, 1995). First, different variants of cluster analysis (e.g. hierarchical

cluster analysis or k-means cluster analysis) can lead to different solutions. Second, it

is not clear how to identify the appropriate number of clusters either ex ante or ex

post. The researcher can therefore apply a great deal of influence on her results from

the way she specifies the problem. The best available practical defence is to subject

the results of a particular cluster analysis to a battery of sensitivity analyses, applying

different cluster methods, using different sub-samples of the data or pre-specifying

6 Notice that this method seems to discard what might be useful information from the data set. Forexample, their method does not discriminate between three separate monthly spells of unemploymentand a contiguous three-month spell of unemployment.

8

different numbers of clusters, for example. Section 6 presents and discusses such a

sensitivity analysis exercise for our cluster analysis based on the OM solution with the

standard cost matrix.

A further criticism, particularly of cluster analysis of sequences, is that results have

not generally been applied to further explanatory analysis with much success. Where

researchers have tried to use cluster solutions as dependent or independent variables

in further analysis (e.g. regression analysis) they have usually been frustrated (Abbott

and Tsay, 1999). However, our application is one where cluster analysis of sequences

is shown to be amenable to this sort of further analysis – in the form of a regression

model with the cluster solution as the dependent variable. This is discussed in more

detail in Section 5.

In our standard approach, we set the number of clusters to be five.7 We then use the

OM output (distance matrix), based on the standard cost matrix, as the input to a

hierarchical cluster analysis carried out in SPSS. The cluster method here was the

"between-groups linkage", one of the class of "average linkage" methods, using as a

distance measure the "squared Euclidean distance" (see Alenderfer and Blashfield,

1984, and Section 6 of this paper for a discussion of alternatives). The analysis is

carried out on the full sample of 712 individuals over the full sample period of 72

months. Table 2 presents descriptions of the five clusters along with some information

on the characteristics of their members.

Cluster 1, the largest cluster, is dominated by employment. The mean (standard

deviation) number of months in employment within this cluster is 49.4 (13.7)

compared to an overall sample mean (standard deviation) of 32.2 (23.2). HE

dominates Cluster 2. Cluster 3 is less distinct, but dominated by FE.

Cluster 4 is of most interest given the policy context of the research – it is the cluster

dominated by joblessness. The average (standard deviation) amount of joblessness

experienced within this cluster is 43.6 (12.9) months compared to the sample average

7 Our reasons for this are discussed in Section 6.

9

(standard deviation) of 6.2 (12.4) months. So the cluster analysis creates a distinct

‘unsuccessful transition’ group (see Table 2 for an introduction to the background

characteristics of this group). Finally cluster 5 is dominated equally by long spells of

FE or training. Treating these activities as one (vocational education/training) gives us

a cluster mean (standard deviation) for months in vocational education/training of

34.9 (19.0). In short, the clusters appear well defined and display significantly distinct

properties compared to the full sample. Clusters 3 and 5 are the least well defined (a

point we return to below).

Cluster 4 gives a clear ‘unsuccessful transition’ group displaying high levels of

joblessness. Whilst spells of joblessness can also be found dispersed across the other

clusters, average numbers of months in joblessness for the other clusters are all below

the full sample average. More importantly, there are very few long spells of

joblessness to be found in these other clusters. Instead, jobless spells are more likely

to be characterised by a few months here and a few months there. Given this, there is

enough distinction to believe cluster 4 identifies most, if not quite all, of the

unsuccessful transitions. Many of the members of cluster 4 may not previously have

been identified as ‘at risk’, by, for example, careers advisers, since most start off their

transitions in some form of education, training or employment. The picture is one of a

significant group of young people (7% of the cohort) drifting into excluded positions

between the ages of 17 and 19. Few begin their transitions in joblessness.

From Table 2, the clusters can also be distinguished by the (static) background

characteristics of their members (not used in the cluster analysis itself). All display

gender imbalances, with the employment cluster dominated by males and all others –

particularly the HE and joblessness clusters – female dominated. There are

community differences apparent in cluster 1 (Catholics are under-represented in

employment dominated transitions) and cluster 4 (Catholics are over-represented in

joblessness dominated transitions). Unsurprisingly, qualification levels display the

most variation across clusters, with the unsuccessful transitions dominated by those

with fewer qualifications at 16. Finally, our social class measure (father unemployed

at time of survey) shows clearly that family background can affect chances of

10

successful transitions (particularly into HE) and unsuccessful transitions into

joblessness.

It is apparent that OM-based cluster analysis can sort the sample into distinct groups,

or ‘types’ of transition. Further, the type of transition that young people experience

appears to be related to their background characteristics. But people that drift into

social exclusion or long term joblessness are often characterised by several correlated

disadvantages – multiple disadvantages in the sociology literature – and if we wish to

disentangle these effects we need to go back to econometric analysis. In the following

section we discuss a logit model where transition type is the dependent variable and a

list of background characteristics are the independent variables. In this way we hope

to quantify what are the most important individual and background (observable)

characteristics that can help cause a young person to experience a successful or

unsuccessful transition.

5. A Logit Model of Transition Type

Abbott and Deviney (1992) attempted unsuccessfully to use the output of OM-based

cluster analysis (sequence type) as a dependent variable in a regression analysis. Their

results were either negligible or counterintuitive (Abbott and Tsay, 1999). However,

Abbott and Tsay (1999) identify several examples of sequence types being used with

varying degrees of success as independent variables (see, e.g. Poole and Holmes,

1995; Carpenter, 1996 and Han and Moen, 1998). Han and Moen, for example, found

‘career pathway type’ to have strong effects on the timing of retirement.

Our clusters appear distinct and appear to be related in many cases to the background

characteristics of the cluster members. There is a fairly well established set of causal

factors in the economics and sociology literatures that can help explain much of the

movement into long-term joblessness among young people (see McVicar et al., 2000,

for a review). Many of these commonly used explanatory factors are present in our

data set. In these two respects, our data seemed to suggest that these sequence types

might be sensibly treated as dependent variables in a regression equation to be

explained by a set of background characteristics. The discussion that follows is based

on the five-cluster solution, in turn based on the standard cost matrix. Section 6

11

discusses the sensitivity of the logit results to assumptions on substitution costs, indel

costs and number of clusters.

The dependent variable for the logit model is defined as follows:

Yi = 0, if young person has employment-dominated transition (cluster 1),

Yi = 1, if young person has HE-dominated transition (cluster 2),

Yi = 2, if young person has FE-dominated transition (cluster 3),

Yi = 3, if young person has joblessness-dominated transition (cluster 4),

Yi = 4, if young person has long vocational education or training dominated

transition (cluster 5) (1)

The original sample was stratified in such a way that a predetermined number of

young people were in each activity immediately following completion of compulsory

education at 16. Thus the probability of being in the sample in the first place is related

to the model itself, or the sample is choice-based (see, for example, Armstrong,

1999). Estimation therefore uses LIMDEP7’s choice based logit command.8

Estimation results (coefficients and marginal effects at sample means) are given in

Tables 3 and 4.

The individual and background characteristics have significant explanatory power for

the type of transition that young people experience. Because they are easier to

interpret, the following discussion is based on the marginal effects reported in Table

4. The coefficients (Table 3) are reported for completeness.

Being male increases the likelihood of experiencing an employment-dominated

transition and decreases the likelihood of experiencing a HE, FE or joblessness-

dominated transition. This is consistent with existing evidence for Northern Ireland

and elsewhere that suggests females are more likely than males to stay on in post-

compulsory education, other things being equal (see, e.g. Armstrong, 1999). Existing

evidence also shows that young females in Northern Ireland are generally more likely

8 The sample is weighted by first destination at 16, type of school attended (grammar or secondary) andlocation of school (5 sub-regional areas). Details of the weighting scheme can be found in McVicar etal. (2000) and Armstrong et al (1997).

12

to be jobless than young males (see, e.g. McVicar et al., 2000) – usually the result of

staying home to look after children. Although not officially unemployed, many of

these young women are at risk of increasing social exclusion. Hammer (1997) argues

that in many cases, previous spells of unemployment or joblessness drive young

women to ‘retreat to the home’ rather than persist with job search.

Of particular interest in Northern Ireland is the differences in employment

experiences between the Catholic and Protestant communities, where the rate of

unemployment for Catholic males is consistently higher than the rate for Protestant

males. Catholics have also been found to be more likely to stay on in post-compulsory

education and training rather than enter the labour market directly at 16 (Armstrong et

al., 1997). McVicar et al. (2000) finds that this is reflected in lower numbers of

jobless among Catholics for the first two years of transition than among Protestants.

At 18+ however, joblessness among Catholics, particularly jobless spells of long

duration, increases to higher levels than among Protestants. This is reflected in the

marginal effects of the Catholic dummy variable presented in Table 4. Other things

being equal, Catholics are significantly more likely to experience an unsuccessful

(joblessness-dominated) transition than Protestants. As we would expect, they are less

likely to experience an employment-dominated transition and more likely to

experience a HE-dominated transition.

The GCSE5 variable is a binary dummy for having five or more GCSEs at grades A-

C at the end of compulsory education. These examinations are generally taken at 16

and 5 grades at A-C is the traditional cut-off point for progression onto further

academic education (A-Levels) and then on into HE. We would expect this to have

significant effects on transition paths and this is indeed the case. There are strong

positive effects on the likelihood of a HE or FE dominated transition and strong

negative effects on the likelihood of an employment or joblessness dominated

transition. This variable is not only acting as a qualifications variable, but is likely to

be capturing part of the effect of the (unobserved) raw ability of the young people –

and we would expect the more ‘able’ young people to be less likely to experience an

unsuccessful transition. The grammar school variable may also be acting in this way

since almost all of those young people that attended grammar schools at 11-16 had

13

previously passed an 11+ examination. Again, the direction of the marginal effects

support this interpretation – young people that have attended grammar schools are

more likely to experience HE-dominated transitions and less likely to experience

employment or joblessness dominated transitions.

There are three family background variables included in the model. Firstly, a dummy

for having an unemployed father (at the time of sweep 2 of the survey). Secondly, a

dummy for having a father whose current or most recent job was professional,

managerial or related and thirdly a dummy for whether the young person lived with

both parents at the age of 18 (at the time of sweep 1 of the survey). These all have

effects that are intuitive and supported by existing studies (see, e.g. McVicar et al.,

2000). Young people with unemployed fathers are more likely to experience

unsuccessful joblessness-dominated transitions and less likely to experience

employment-dominated transitions. They are also more likely to experience

transitions characterised by prolonged periods in FE or training. Young people with

professional fathers are more likely to experience HE or FE dominated transitions and

less likely to move directly into employment at 16 or at 18 and less likely to

experience joblessness-dominated transitions. Young people that live with single

parents are more likely to experience joblessness-dominated transitions or

employment-dominated transitions and less likely to stay in education. They are more

likely, however, to experience a prolonged spell of vocational education or training

than young people living with both parents.

Finally, there are four sub-regional dummies (corresponding to Education and Library

Board areas (ELBs)). Existing evidence suggests Belfast and the West of Northern

Ireland are disadvantaged in terms of availability of jobs in local labour markets and

in terms of social need more generally (see, e.g. Robson et al., 1994). The omitted

area here is the Northeast of Northern Ireland, characterised by the lowest

unemployment rates in the region. Therefore marginal effects are expressed relative to

the transition patterns of young people from the Northeast. Consistent with existing

evidence (e.g. McVicar et al., 2000), young people from the West and from Belfast

are more likely than their counterparts in the Northeast to experience unsuccessful

transitions. Those in the West are also less likely to experience an employment-

dominated transition – likely to be reflecting the lack of job opportunities. Instead,

14

young people from the West tend towards FE or HE, as do those from the South and

Southeast of the region.9

The identification of the five separate states for the dependent variable can be tested

by Cramer-Ridder tests of pooling outcomes (see Cramer and Ridder, 1991).10 In this

case we carry out three separate tests for the aggregation of clusters 1 and 5, clusters 2

and 3 and clusters 3 and 5. Recall that cluster 1 is the employment-dominated cluster

and that cluster 5 is the ‘long vocational education/training followed by short

employment’ cluster. Cluster 2 is the HE-dominated cluster and cluster 3 the FE-

dominated cluster (but containing some members that go on to HE). Of all the

clusters, 3 and 5 appear to be the least well determined, as discussed in the previous

section. In each case the aggregation of these states is not supported by the tests so we

retain the five-state disaggregated dependent variable as shown in (1). The test

statistics are 30.0, 45.2 and 9.7 respectively and are distributed chi-square with 12

degrees of freedom (the number of parameter restrictions in the model) giving a 5%

critical value of 5.23. The null hypothesis is the pooling of states. In other words, the

clusters identified by the cluster analysis are supported as being distinct ‘types’ for the

purposes of the logit model.

In the introduction to this paper we set out to ‘predict’ which young people were more

likely to experience unsuccessful transitions, dominated by joblessness, and which

were likely to experience successful transitions, not dominated by joblessness. The

logit model suggests young people with the following characteristics, in order of

magnitude of the marginal effects, have higher chances of unsuccessful transitions:

! Young people with poor qualifications at 16,

! Young people from the West of Northern Ireland or from Belfast and not from the

Southeast or Northeast of Northern Ireland,

9 The tendency for those from Belfast to reject the HE route may be a result of imbalance in the surveythat is not fully accounted for by the weighting regime because of missing careers service records forgrammar school pupils from the Belfast area. These records were used to derive the original sample forsweep 1 of the Status Zero Survey. The weighted sample does not fully compensate for these missingrecords because of small sample sizes in individual cells in the weighting matrix. In some cases thesehave been merged with adjoining cells to avoid over-large weights on a few individuals.10 The Cramer-Ridder test is a likelihood-ratio test comparing the log-likelihoods of the model whensome states of the dependent variable are aggregated and when they are left disaggregated.

15

! Young people with unemployed fathers,

! Young people not living with both parents at 18,

! Females,

! Catholics,

! Young people whose fathers are not in the professional, managerial or related

employment category.

Abbott and Tsay (1999) remark that ‘the proof of the classificatory pudding is in the

explanatory eating’. In as much as this is the case, the results of the logit model of

transition types described here – which are generally significant, intuitive and

consistent with existing applied econometric research – support our OM and cluster

analysis-based typology of the transition from school to work. How much we can

claim for these results depends to a large extent on how fragile or robust they are to

making alternative assumptions at each stage. This is discussed in the next section.

6. Sensitivity Analysis

Sensitivity to Assumptions on Substitution and Indel Costs in the OM Analysis

The five-cluster solutions based on the OM analysis using the standard cost matrix but

with high indel costs and using the unit cost matrix are similar, but not identical, to

the five-cluster solution outlined in Section 4 for the standard cost matrix with low

indel costs. The clusters that are identified are very similar in nature (e.g. a large

employment-dominated cluster, a smaller joblessness-dominated cluster etc.) to those

outlined in Section 4, but cluster size and membership is not identical. In particular,

using the unit cost matrix gives a larger – but slightly less distinct – joblessness

cluster. It has 101 members, with average joblessness of 27.6 months and standard

deviation of 19.4 months compared to an average of 43.4 and a standard deviation of

13.0 based on the standard cost matrix. This reduction in the distinction of the

joblessness cluster is a result of the lower substitution costs between joblessness and

other activities.

16

The main difference between the standard cost matrix using the high indel cost

compared to the low indel cost is the reduced size of cluster 5. Most of the members

of cluster 5 from the low indel cost analysis are now located in cluster 3. Given that

the clusters 3 and 5 appear to be the least distinct, the fact that their memberships are

the most sensitive to changes in assumptions is not surprising. All else, however,

appears robust.

Table 5 reports the marginal effects of the logit model based on the unit cost matrix

five-cluster solution. Overall the results are similar to those based on the standard cost

matrix. For example, all the marginal effects for clusters 1 and 2 for both models

share the same signs. The marginal effects for cluster 4 also share the same signs with

the exception of the male dummy, which displays a positive effect in the unit cost

matrix model. The ‘loosening’ of this cluster – which is characterised by more spells

of employment interspersed with spells of joblessness – is the likely explanation for

this. There are some differences between the models in the marginal effects for

clusters 3 and 5 that result from switching members.

Sensitivity to Specification of the Number of Clusters

In the analysis above we have set the number of clusters at five. Our reasons for this

stem from the trade-off between the number of values the dependent variable can take

in the logit model (not too many), the size of each cluster (not too small) and the level

of distinction of each cluster (not too few clusters).11 But what happens if we specify

three clusters, or ten?

Figure 2 shows each stage of the evolution of clusters from the two-cluster solution to

the ten-cluster solution (using hierarchical cluster analysis). It is the HE-dominated

cluster (cluster 2 in our five-cluster solution) that is the first to break off. This remains

intact up to ten clusters. The next to break off (at the three-cluster solution) is the long

vocational education and training group (cluster 5 in our five-cluster solution). This

also remains intact until the ten-cluster solution where a small group with experience

of long-ish spells of joblessness splits off. At the four-cluster solution, the FE group

11 The more values the dependent variable can take the more the number of parameters to estimate inthe model proliferates.

17

(cluster 3 in our five-cluster solution) separates. This group splits again at the six-

cluster solution into those that go on to HE from FE and those that do not. That this

group is the most sensitive to our choice of five clusters is perhaps not surprising

given its relative lack of distinction. The joblessness cluster first breaks off from the

main cluster at the five-cluster solution but remains together until the eight-cluster

solution where a very small sub-group that stayed on at school before joblessness

splits off.

Evidently, we need to specify at least five clusters in order to identify what we have

labelled the ‘unsuccessful’ transitions. However, Cramer-Ridder tests rule out pooling

the joblessness and employment categories in the logit model so we would be unwise

to reduce the number of clusters below five.12 Increasing the disaggregation

(increasing the number of clusters) has little effect on this joblessness cluster.

So, we cannot estimate a logit model with less than the five clusters or we will lose

the unsuccessful transition cluster. But what of estimating a model with more than the

five clusters? The problem here is that adding clusters leads to a proliferation of the

parameters the model has to estimate, so we are reluctant to go much above six

clusters. However, the logit model for the six-cluster case (where the FE cluster has

split into two groups) gives very similar results to those presented in Tables 3 and 4

for the five-cluster case. The coefficients and marginal effects change very little for

the joblessness cluster and for the other clusters. The most change is unsurprisingly

seen in the estimates for cluster 3. However, even these estimates appear broadly

robust, probably since the group that breaks off from cluster 3 is small. Of course, it

would have been very helpful had a Cramer-Ridder test suggested pooling the newly

split FE clusters to give us the five-cluster logit model – unfortunately it does not.

Sensitivity to Particular Cluster Analysis Techniques

Different clustering techniques can sometimes lead to different outcomes. So far all

our cluster analysis has been based on “between-groups linkage” hierarchical

clustering. Other hierarchical methods give similar patterns of results with one or two

12 The Cramer-Ridder test for pooling clusters 1 and 4 in the five-cluster logit model gives a teststatistic of 56.4 with critical value 5.23 so the null hypothesis of pooling states is rejected.

18

exceptions.13 Here we give details of the clustering exercise based on the standard

cost matrix using k-means clustering. Once again we set the number of clusters equal

to five.

The k-means technique also gives broadly similar clusters to the hierarchical

technique – in particular the joblessness and HE-dominated clusters are very similar.

In this case, the joblessness cluster has a mean number of months in joblessness of

42.5 and a standard deviation of 12.4. However, the employment-dominated cluster

from our earlier analysis is here spread over two roughly equal sized clusters. One is

characterised by two or three years of education or training and then employment. The

other is characterised by less than two years education or training and then

employment (including those that enter employment directly). Clusters 3 and 5 from

the earlier analysis are here pooled into a larger cluster characterised by at least four

years of education or training, mostly followed by one or two years employment.

Since both hierarchical and k-means methods identify a very similar and clear

joblessness cluster, along with broadly similar other clusters, our results would appear

robust to choice of particular method. As we would expect, the logit model shows

very similar marginal effects for the joblessness cluster based on k-means as those

presented in Table 4.

Robustness across Samples

All analysis has so far been based on the full sample available from the Status Zero

Survey. There is no other similar data (e.g. data for another cohort) with which we can

test the robustness of our results across samples. However, we can sub-divide our

current sample and re-run the analysis to check for sensitivity. The sample is split in

half randomly and the OM and cluster analysis, specifying five clusters, is repeated.

13 The analysis was repeated using the six other methods available within SPSS: within-groupslinkage; nearest neighbour; furthest neighbour; centroid clustering; median clustering; and Ward'smethod (see Alenderfer and Blashfield, 1984, for a description of these methods). For four out of thesix methods the five cluster solution had broadly similar characteristics to that in our standard case. Ineach of them there were clear employment, unemployment and HE clusters, although the boundariesbetween our standard case clusters 3 and 5 (the FE and FE/Training clusters) varied somewhat. Justtwo methods produced very different results: nearest neighbour and farthest neighbour. Single linkagemethods such as these two are known though to be prone to producing anomalous results (here indeedthe nearest neighbour method put all but a handful of cases in the same cluster).

19

The clusters identified in each half-sample are broadly similar to those identified in

the full sample. Both half-samples display HE-dominated clusters of roughly equal

size. Both display joblessness-dominated clusters, also of roughly equal size. In one

case, the employment-dominated cluster is split in two (longer and shorter initial

spells in education or training) leaving a single FE-dominated cluster. In the other

case, there is a single employment-dominated cluster and two FE/training dominated

clusters. The jobless cluster in the first half-sample has mean months of joblessness of

46.6 with standard deviation 11.6. The second half-sample joblessness cluster has

mean months of joblessness 46.2 with standard deviation 8.9. In other words, the

joblessness cluster for each half-sample displays almost identical characteristics to the

full-sample joblessness cluster. The logit model is estimated for each half sample.

Given the similarity in the cluster solutions, the logit solutions are unsurprisingly also

similar.14

7. Concluding Remarks

This paper shows how sequence techniques can be used to create a typology of

transitions from school to work – in particular distinguishing unsuccessful transitions

dominated by joblessness from broadly successful transitions dominated by

employment or extended education. This typology is then used here as the basis of a

logit model where young people’s transition types can be ‘predicted’ from their

background characteristics. The results of this exercise are intuitive, consistent with

existing evidence based on more standard stochastic approaches and generally robust

to a large number of specification changes.

The paper adds value to the existing literature in two main respects. Firstly, on the

technical side, we successfully use the output of cluster analysis of sequences, itself

based on OM analysis, as the dependent variable in a regression equation for

transition type. Previous attempts at using cluster analyses of sequences in this way

before have met with little success. Our intuition for why we might have made more

progress on this front is that our data – and our problem – are ideally suited to such an

exercise.

14 More details and results from the sensitivity analyses discussed in Section 6 are available from theauthors on request.

20

Secondly, the typology of unsuccessful transitions and the results showing how young

people’s transition types depend on their individual and family background

characteristics can contribute to an ‘early warning system’ for policy makers

concerned with reducing the drift of significant numbers of young people into long-

term joblessness and social exclusion. We are not quite at the stage where we can put

background characteristics in one end and get an exact prediction of transition

patterns out at the other end – but we can identify young people that are more at risk

of experiencing unsuccessful transitions. Further research may reduce the uncertainty.

21

References

Abbott, A. and Deviney, S. (1992). ‘The welfare state as transnational event.’ SocialScience History, 16, pp 245-74.

Abbott, A. and Forrest, J. (1986). ‘Optimal matching methods for historical data.’Journal of Interdisciplinary History, 16, pp 473-96.

Abbott, A. and Forrest, J. (1990). ‘The optimal matching method for anthropologicaldata.’ Journal of Quantitative Anthropology, 2, pp 151-70.

Abbott, A. and Hrycak, A. (1990). ‘Measuring resemblance in social sequences.’American Journal of Sociology, 96, pp 144-85.

Abbott, A. and Tsay, A. (1999). ‘Sequence analysis and optimal matching methods insociology: review and prospect.’ Mimeo, University of Chicago.

Aldenderfer, M. S. and Blashfield, R. K. (1984). Cluster Analysis. Sage Publications,London.

Anyadike-Danes, M. and McVicar, D. (2000a). ‘Characterising the transition fromschool to work in Northern Ireland: alternative data analytic strategies.’ Paperpresented to the Conference on Applied Statistics in Ireland, Rosslare, CountyWexford, 17th-19th May, 2000.

Anyadike-Danes, M. and McVicar, D. (2000b). ‘Markov models and the transitionfrom school to work’ Mimeo, Northern Ireland Economic Research Centre, Belfast,UK.

Armstrong, D. (1999). ‘School performance and staying-on: A micro analysis forNorthern Ireland.’ Manchester School, 67, 2, pp 203-230.

Armstrong, D. and McVicar, D. (1999). ‘Value added in further education andvocational training in Northern Ireland.’ Applied Economics, forthcoming.

Armstrong, D., Istance, D., Loudon, R., McCready, S., Rees, G. and Wilson, D.(1997). Status 0: A socio-economic study of young people on the margin. Training andEmployment Agency, Belfast.

Carpenter, D. (1996). Corporate identity and administrative capacity in executivedepartments. Unpublished PhD dissertation, University of Chicago.

Cramer, J. S. and Ridder, G. (1991). ‘Pooling states in the multinomial logit model,’Journal of Econometrics, 47, pp 267-72.

Dijkstra, W. and Taris, T. (1995). Measuring the agreement between sequences.Sociological Methods and Research, 24, pp 214-231.

22

Halpin, B. and Chan, T-W. (1998). Class careers as sequences. European SociologicalReview, 14, 2, pp 111-30.

Hammer, T. (1997). History dependence in youth unemployment. EuropeanSociological Review, 13, 1, pp 17-33.

Han, S-K., and Moen, P (1998). ‘Clocking out.’ Mimeo, Cornell University.

Hannan, D.F. and Doyle, A. (2000), ‘Changing school-to-work transitions: threecohorts 1982-1987; 1986-1992; 1992-1998.’ Mimeo, ESRI Dublin, Paper for ESRISeminar, 17 February 2000.

McVicar, D., Loudon, R., McCready, S., Armstrong, D. and Rees, G. (2000). Youngpeople and social exclusion in Northern Ireland: Status 0 four years on. Training andEmployment Agency, Belfast.

Morgan, B. J. T. and Ray, A. P. G. (1995). ‘Non-uniqueness and inversions in clusteranalysis.’ Applied Statistics, 44, pp 117-34.

Poole, M. S. and Holmes, M. E. (1995). ‘Decision development in computer assistedgroup decision-making.’ Human Communication Research, 22, pp 90-127.

Robson, B., Bradford, M. and Deas, I. (1994). ‘Relative deprivation in NorthernIreland.’ Occasional Paper 28, Policy, Planning and Research Unit, Department ofFinance and Personnel, Belfast.

Sankoff, D. and Kruskal, J.B. eds (1983). Time warps, string edits andmacromolecules. Reading MA: Addison Wesley.

Stovel, K. Savage, M. and Bearman, P. (1996). ‘Ascription into achievement.’American Journal of Sociology, 102, pp 358-99.

23

Figures and Tables

24

Figure 1: Cohort Activities, October93 - March99, % of Cohort

Source: NIERC. Notes: The graph shows the proportion of the weighted sample in each activity each

month. Activities are primary activity during the given month.

0%

20%

40%

60%

80%

100%

jobless

train

em p

fe

he

school

Figure 2: Dendogram for k-Cluster Solutions, k=1-10

H d

N=All o

N=416Employment andunemployment

NO

jo

N=97Employment

direct

N=272Employment

non-direct

N=369 (1)Employment

Note: Members of the five-

N=576Non-HE

25

493thers

N=83 (5)Long vocational

education ortraining

N=6othe

N=41Purer

FE

N=10Non-FE

N=26FE-HE

N=47 (4)Joblessnessdominated

N=51FE, non-

HE

N=5School

thenjobless

=42therbless

N=77 (3)FE

cluster solution are identified by cluster numbers in parentheses.

N=136 (2)E dominate

7rs

Fivecluster

solution

Ten

N=16Longerjobless

clustersolution

26

Table 1: Substitution Costs for the Standard Cost Matrix

School FE Training Employment Joblessness

1 1 2 1 3 HE

1 2 1 3 School

2 1 2 FE

1 1 Training

2 Employment

Joblessness

27

Table 2: Five Cluster Solution – Descriptions and Background CharacteristicsCluster N Description Mean no.

of months

in school

(standard

deviation)

Mean no.

of months

in FE

(standard

deviation)

Mean no. of

months in

employment

(standard

deviation)

Mean no.

of months

in training

(standard

deviation)

Mean no. of

months in

joblessness

(standard

deviation)

Mean no. of

months in

HE

(standard

deviation)

Male

%

Catholic

%

5+GCSEs

at 16

%

Father

unemployed

%

1 308 Employmentdominated

1.4

(4.5)

8.4

(10.9)

49.4

(13.7)

8.8

(12.2)

3.8

(7.1)

0.1

(0.9)

57 43 19 14

2 184 HE dominated 18.7

(12.6)

11.3

(14.8)

3.3

(5.1)

0.2

(1.3)

1.5

(3.4)

37.1

(9.1)

41 50 87 7

3 79 FE dominated 7

(11.6)

33.5

(17.4)

13.3

(8.3)

3.2

(7.4)

3.9

(7.4)

11

(13.9)

46 46 43 21

4 47 Joblessness dominated 3.3

(7.4)

4.9

(7.6)

12.4

(12.2)

7.9

(9.2)

43.6

(12.9)

0

(0)

41 59 8 37

5 95 Long FE/training thenemployment

6.0

(10.8)

20.7

(17.2)

22.8

(11.0)

16.3

(20.1)

5.5

(8.4)

0.7

(2.7)

44 47 30 28

Wholesample

712 7.2

(8.5)

11.7

(15.8)

32.2

(23.2)

6.9

(8.3)

6.2

(12.4)

8.4

(15.6)

52 48 37 16

Note: The sample numbers and cluster proportions are based on the weighted sample (see McVicar et al., 2000, for weighting details).

29

Table 3: Logit Model for Five-Cluster Solution – Coefficients

(Relative Probabilities)

Cluster 2 vs

Cluster 1

Cluster 3 vs

Cluster 1

Cluster 4 vs

Cluster 1

Cluster 5 vs

Cluster 1

Constant -3.55*** -3.19*** -1.87*** -1.90***

Male -0.27 -0.29 -0.82** -0.49*

Catholic 0.21 0.05 0.45 0.26

GCSE5 2.98*** 1.16*** -0.84 0.68**

Grammar 1.72*** 0.23 -0.21 0.25

Funemp 0.34 0.82** 1.02*** 1.11***

Fprof 0.93*** 0.97*** 0.13 0.89***

Livboth 0.43 1.03*** -0.63* -0.35

Belfast -1.10* 0.25 0.78 1.00**

West 0.55 0.69 0.91 0.14

South 0.58* 0.43 0.20 0.10

Southeast 0.83** 0.86** -1.31 0.71*Notes: * = significant at 10%, ** = significant at 5%, *** = significant at 1%.

30

Table4: Logit Model for Five-Cluster Solution – Marginal Effects at

Sample Means

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5

Constant .69*** -.33*** -.25*** -.02*** -.09

Male .10*** -.01* -.01** -.03*** -.05

Catholic -.05*** .02** -.01 .01*** .03

GCSE5 -.35*** .34*** .07*** -.06*** .01

Grammar -.16*** .21*** -.01 -.02*** -.01

Funemp -.19*** -.01 .06*** .03*** .12**

Fprof -.22*** .08*** .08*** -.01*** .08*

Livboth -.06*** .05*** .13*** -.03*** -.08**

Belfast -.03 -.17*** .03*** .03*** .15**

West -.12*** .05*** .06*** .03*** -.02

South -.09*** .06*** .04*** .01 -.01

Southeast -.15*** .08*** .08*** -.07*** .07

Pseudo R2 .22Notes: * = significant at 10%, ** = significant at 5%, *** = significant at 1%.

31

Table 5: Logit Model for Five-Cluster Solution based on Unit Cost Matrix –

Marginal Effects at Sample Means

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5

Cluster n 326 161 38 101 86

Constant .49*** -.40*** -.07*** -.01 -.01

Male .15*** -.03*** .01*** .03*** -.16***

Catholic -.03 .07*** -.01*** .01** -.05

GCSE5 -.31*** .45*** -.04*** -.10*** .01

Grammar -.26*** .28*** -.01*** -.11*** .10*

Funemp -.08** .01 .03*** .07*** -.03

Fprof -.12*** .10*** .01*** -.03*** .05

Livboth -.02 .11*** .01*** -.05*** -.04

Belfast -.11*** -.07*** -.05*** .02** .22***

West -.16*** .03 .03*** .07*** .03

South -.08** .03 .02*** .02** .01

Southeast -.13*** .11*** .02*** -.15*** .15**

Pseudo R2 .22Notes: * = significant at 10%, ** = significant at 5%, *** = significant at 1%.