A Semiparametric Panel Approach to Mortality Modeling

7
Insurance: Mathematics and Economics 61 (2015) 264–270 Contents lists available at ScienceDirect Insurance: Mathematics and Economics journal homepage: www.elsevier.com/locate/ime A semiparametric panel approach to mortality modeling Han Li , Colin O’Hare, Xibin Zhang Department of Econometrics and Business Statistics, Monash University Melbourne, VIC 3800, Australia article info Article history: Received October 2014 Received in revised form February 2015 Accepted 4 February 2015 Available online 12 February 2015 Keywords: Mortality Stochastic models Semiparametric Panel models Forecasting abstract During the past twenty years, there has been a rapid growth in life expectancy and an increased attention on funding for old age. Attempts to forecast improving life expectancy have been boosted by the development of stochastic mortality modeling, for example the Cairns–Blake–Dowd (CBD) 2006 model. The most common optimization method for these models is maximum likelihood estimation (MLE) which relies on the assumption that the number of deaths follows a Poisson distribution. However, several recent studies have found that the true underlying distribution of death data is overdispersed in nature (see Cairns et al. 2009 and Dowd et al. 2010). Semiparametric models have been applied to many areas in economics but there are very few applications of such models in mortality modeling. In this paper we propose a local linear panel fitting methodology to the CBD model which would free the Poisson assumption on number of deaths. The parameters in the CBD model will be considered as smooth functions of time instead of being treated as a bivariate random walk with drift process in the current literature. Using the mortality data of several developed countries, we find that the proposed estimation methods provide comparable fitting results with the MLE method but without the need of additional assumptions on number of deaths. Further, the 5-year-ahead forecasting results show that our method significantly improves the accuracy of the forecast. © 2015 Elsevier B.V. All rights reserved. 1. Introduction Mortality modeling has attracted an increasing amount of at- tention in recent years. Its importance is not only in demogra- phy analysis, but also in actuarial science, social and public fund- ing and population dynamics. The changes in regulation around insurance systems, increasing attention paid by capital markets to longevity risks and government reforms of welfare systems have all increased the need to understand mortality. During the past few decades life expectancy has been improving at approximately three years per decade, resulting in increased pressure on personal and public finances. Mortality and longevity risk have therefore be- come a significant risk faced by governments, insurance compa- nies, pension providers and individuals. Many attempts have been made to model mortality. Indeed, mortality modeling has an extremely long history going back to the mid 1800s, including Gompertz model for example. Early mod- els were deterministic in nature owing to the lack of comput- ing ability and so did not allow for uncertainty in forecasts. Since Corresponding author. E-mail addresses: [email protected] (H. Li), [email protected] (C. O’Hare), [email protected] (X. Zhang). early 1990s with the increasing computing power and newer tech- niques, stochastic modeling of mortality has become more preva- lent. One of the earliest stochastic models was that of Lee and Carter (1992) who developed a one factor time series model of mortality rates which they applied to US data. The success of this model is perhaps encapsulated in the fact that there have been many, many extensions and modifications of the Lee–Carter model. Among different stochastic models a strong contender has been the Cairns–Blake–Dowd (CBD) model introduced by Cairns et al. (2006). In this paper we focus on this model as an example of a time series mortality model. We will express the CBD model as a semiparametric panel model and propose a local linear estimation method to fit the model to the data freeing the model from any distributional assumptions. The most popular and widely used estimation method for stochastic mortality models is the maximum likelihood estimation (MLE) proposed by Brouhns et al. (2002). This method relies on an important assumption that the number of deaths follows a Poisson distribution. Several recent studies have found that the true underlying distribution of death data has a variance which is much greater than the mean (see Cairns et al., 2009 and Dowd et al., 2010). The ‘‘overdispersion’’ feature exists in many developed countries such as UK and US. This indicates that the current method may have underestimated the variability of future mortality rates. Moreover, the parameters in the CBD model are assumed to follow http://dx.doi.org/10.1016/j.insmatheco.2015.02.002 0167-6687/© 2015 Elsevier B.V. All rights reserved.

Transcript of A Semiparametric Panel Approach to Mortality Modeling

Insurance: Mathematics and Economics 61 (2015) 264–270

Contents lists available at ScienceDirect

Insurance: Mathematics and Economics

journal homepage: www.elsevier.com/locate/ime

A semiparametric panel approach to mortality modelingHan Li ∗, Colin O’Hare, Xibin ZhangDepartment of Econometrics and Business Statistics, Monash University Melbourne, VIC 3800, Australia

a r t i c l e i n f o

Article history:Received October 2014Received in revised formFebruary 2015Accepted 4 February 2015Available online 12 February 2015

Keywords:MortalityStochastic modelsSemiparametricPanel modelsForecasting

a b s t r a c t

During the past twenty years, there has been a rapid growth in life expectancy and an increasedattention on funding for old age. Attempts to forecast improving life expectancy have been boostedby the development of stochastic mortality modeling, for example the Cairns–Blake–Dowd (CBD) 2006model. The most common optimization method for these models is maximum likelihood estimation(MLE) which relies on the assumption that the number of deaths follows a Poisson distribution. However,several recent studies have found that the true underlying distribution of death data is overdispersedin nature (see Cairns et al. 2009 and Dowd et al. 2010). Semiparametric models have been applied tomany areas in economics but there are very few applications of such models in mortality modeling. Inthis paper we propose a local linear panel fitting methodology to the CBD model which would free thePoisson assumption on number of deaths. The parameters in the CBDmodel will be considered as smoothfunctions of time instead of being treated as a bivariate random walk with drift process in the currentliterature. Using the mortality data of several developed countries, we find that the proposed estimationmethods provide comparable fitting results with the MLE method but without the need of additionalassumptions on number of deaths. Further, the 5-year-ahead forecasting results show that our methodsignificantly improves the accuracy of the forecast.

© 2015 Elsevier B.V. All rights reserved.

1. Introduction

Mortality modeling has attracted an increasing amount of at-tention in recent years. Its importance is not only in demogra-phy analysis, but also in actuarial science, social and public fund-ing and population dynamics. The changes in regulation aroundinsurance systems, increasing attention paid by capital markets tolongevity risks and government reforms of welfare systems haveall increased the need to understand mortality. During the pastfew decades life expectancy has been improving at approximatelythree years per decade, resulting in increased pressure on personaland public finances.Mortality and longevity risk have therefore be-come a significant risk faced by governments, insurance compa-nies, pension providers and individuals.

Many attempts have been made to model mortality. Indeed,mortality modeling has an extremely long history going back tothemid 1800s, including Gompertzmodel for example. Earlymod-els were deterministic in nature owing to the lack of comput-ing ability and so did not allow for uncertainty in forecasts. Since

∗ Corresponding author.E-mail addresses: [email protected] (H. Li), [email protected]

(C. O’Hare), [email protected] (X. Zhang).

http://dx.doi.org/10.1016/j.insmatheco.2015.02.0020167-6687/© 2015 Elsevier B.V. All rights reserved.

early 1990s with the increasing computing power and newer tech-niques, stochastic modeling of mortality has become more preva-lent. One of the earliest stochastic models was that of Lee andCarter (1992) who developed a one factor time series model ofmortality rates which they applied to US data. The success of thismodel is perhaps encapsulated in the fact that there have beenmany,many extensions andmodifications of the Lee–Cartermodel.Among different stochastic models a strong contender has beenthe Cairns–Blake–Dowd (CBD) model introduced by Cairns et al.(2006). In this paper we focus on this model as an example of atime series mortality model. We will express the CBD model as asemiparametric panel model and propose a local linear estimationmethod to fit the model to the data freeing the model from anydistributional assumptions.

The most popular and widely used estimation method forstochastic mortality models is themaximum likelihood estimation(MLE) proposed by Brouhns et al. (2002). This method relies onan important assumption that the number of deaths follows aPoisson distribution. Several recent studies have found that thetrue underlying distribution of death data has a variance whichis much greater than the mean (see Cairns et al., 2009 and Dowdet al., 2010). The ‘‘overdispersion’’ feature exists inmanydevelopedcountries such asUK andUS. This indicates that the currentmethodmay have underestimated the variability of future mortality rates.Moreover, the parameters in the CBDmodel are assumed to follow

H. Li et al. / Insurance: Mathematics and Economics 61 (2015) 264–270 265

a bivariate random walk with drift process. However, severalresearchers have questioned the appropriateness of using thisprocess to project future morality rates (Sweeting, 2011; Boothet al., 2002; De Jong and Tickle, 2006).

In this paper we propose a new estimation method for futuremortality rates so freeing the Poisson assumption on number ofdeaths. The CBD model is reframed as a time-varying coefficientpanel model, and the parameters are treated as smooth functionsof time. Approximated functional forms of the parameters will alsobe given. For future mortality projection, we introduce firstly alocal linear forecasting method and then a parametric forecastingmethod based on the approximated functional forms of theparameters.

Using the mortality data of several developed countries overthe period 1950–2009 to fit our model, we find that the proposedestimation methods provide comparable fitting results with theMLE approach but without the need for any assumptions on errordensity distribution. Further, we use backtesting to assess theperformance of the two forecasting methods we introduce. The5-year-ahead forecasting results show that both of our methodsoutperform the bivariate random walk with drift method in themajority of scenarios.

The rest of the paper is organized as follows. In Section 2, abrief literature review of stochastic mortality modeling is given.Section 3 introduces the local linear estimation method for time-varying coefficient panel models together with the two forecastingmethods. Section 4 describes the structure of data used in thispaper. Section 5 gives empirical fitting and forecasting results ofthe proposed methods. Conclusions are drawn in Section 6.

2. Literature review

During the past twenty years, the interest in mortality andlongevity risk has grown rapidly. The development of stochas-tic mortality modeling and the introduction of mortality linkedfinancial products have provided new motivation to understandmortality risk. An early stochastic model for mortality rates wasintroduced by Lee and Carter (1992) and after that, many exten-sions and modifications of the Lee–Carter model were developed(for example Brouhns et al., 2002; Renshaw and Haberman, 2006;Cairns et al., 2006; Plat, 2009 and O’Hare and Li, 2012). This sectionwill start by defining some important notation used in the mortal-ity literature and that we will use throughout the paper.

2.1. Notation

In the literature of mortality modeling, there are generally twotypes of stochastic models. One type models the central mortalityrate and the other type models the initial mortality rate. Now wedefine the following terms:

• Define dx,t as the observed number of deaths in calender yeart aged x last birthday and Dx,t is the corresponding randomvariable. Define Ex,t , the exposure as the average population incalendar year t aged x last birthday where x ∈ [a + 1, a + N],and t ∈ [1, T ]. a, T and N are non-negative integers.

• Central mortality rate, denoted by mx,t , reflects the deathprobability for age x last birthday in the middle of the calenderyear and it is estimated by

mx,t =dx,tEx,t

. (1)

• Initial mortality rate, denoted by qx,t , is the one-year deathprobability for a person who is aged exactly x at time t .

The central mortality rate is related to the initial mortality ratethrough the formula below1:

qx,t = 1 − exp(−mx,t). (2)

Numerically, the values of central mortality rate and initialmortality rate are very close to each other and it is generallybelieved that the relationship between qx,t and mx,t provides anaccurate approximation.

2.2. CBD model

The CBD model was introduced by Cairns et al. in 2006. Themodel is of the form:

logit(qx,t) = log

qx,t1 − qx,t

= κ1

t + κ2t (x − x), (3)

where κ1t captures the general time effect and κ2

t captures age-specific time effect. x =

1N

a+Nx=a+1 x is the average of sample age

range.When fitting the CBD model we observe downward sloping κ1

tand overall upward sloping κ2

t . Intuitively, the downward trendof κ1

t reflects the fact that the overall mortality rate for each agegroup is decreasing over time (or in other words life expectancy isincreasing). The increasing trend of κ2

t suggests that improvementsin mortality rates are greater at younger ages.

The strength of themodel, other than its simplicity, is that it hasmultiple factors which result in a non-trivial correlation structure.This reflects the fact thatmortality improvements are happening atdifferent rates at different ages. However, the fitting of the modelis generally better for older ages (50–89) compared to the wholeage range.

2.3. MLE method

Brouhns et al. (2002) provided a fitting methodology for timeseries mortality models such as the Lee–Carter model using anMLE approach. This method is based on the assumption that Dx,tfollows an independently distributed Poisson distributionwith pa-rameter Ex,tmx,t for each x and t . Thus the method allows for het-eroskedasticity inmortality experience and it has become themostcommonly adopted parameter estimation method for stochasticmortality models. Themodel is easily implemented using softwaresuch as R, see for example in Renshaw and Haberman (2006), Plat(2009) and O’Hare and Li (2012). The approach has also been usedin developing models for the Lifemetrics package.2

FullMLE iteration is normally used to obtain theMLE estimatorsand let Φ denote the parameter set, the natural logarithm of thelikelihood function is given by:

l(Φ; d, E) =

x

t

{dx,t ln[Ex,tmx,t(Φ)]

− Ex,tmx,t(Φ) − ln(dx,t !)}. (4)

A recent study by Cairns et al. (2009) states that given that thedata is reliable, the model is correct and the Poisson assumptionis also correct, after fitting the model by MLE method, weshould expect the standardized residuals have zero mean and

1 This relationship is a commonly recognized actuarial approximation from mx,tto qx,t . For readers who are interested in the detailed derivation of the formula,please refer to: Dickson et al. (2009). Actuarial Mathematics for Life ContingentRisks. Cambridge University Press, London.2 Life metrics is a toolkit for measuring and managing longevity risk created

by JP Morgan and adopted by the Life and Longevity Markets association, seehttp://www.llma.org.

266 H. Li et al. / Insurance: Mathematics and Economics 61 (2015) 264–270

variance of one. However, the study found that the variance of thestandardized residuals was significantly greater than one for eachmodel included in the analysis. For CBD model, using England andWales data for ages 60–89 over the period 1961–2004, the varianceof standardized residuals was 5.9 (Cairns et al., 2009). Cairnset al. (2009) claim that a possible reason for this overdispersionphenomenon is because the exposures data are estimated. Thereare other possible reasons such as model mis-specification orthe Poisson assumption on number of deaths being invalid. Eventhough so far there is no proved evidence that the overdispersionin death data would have a significant effect on the forecastingperspective of the model, the validity of using Poisson assumptionin the estimation process would still be questionable since it mightunderestimate the true volatility of future mortality rates.

The MLE method was originally introduced as an alternativemethod to estimate the Lee–Carter model. Ordinary regressionmethods would not work for Lee–Carter model since there are noregressors in the equation (Lee and Carter, 1992). However, forthe CBD model, regressors are well identified as 1 and (x − x).Therefore it is feasible to estimate the model using a differentmethodology with looser assumptions on error structure. This isone of the motivations of our research.

2.4. Bivariate random walk with drift

In the current literature, (κ1t , κ

2t ) is assumed to follow a bivariate

random walk with drift process. Define βt =

κ1t

κ2t

, we have

βt = βt−1 + µ + CZt , (5)

whereµ is a 2×1matrix of the drift factors and Zt is a 2×1matrixof independent standard normal random variables. V = C ′C is thevariance–covariance matrix of κ1

t and κ2t . We restrict C to be an

upper triangular matrix by Cholesky decomposition. Let ∆ be thefirst difference operator and 1κ i

t = κ it+1 − κ i

t for i = 1, 2. µ and Vcan then be estimated from κ1

t and κ2t being given by

• µ =

E(1κ1

t )

E(1κ2t )

.

• V =

E(1κ1

t − E(1κ1t ))(1κ1

t − E(1κ1t )) E(1κ1

t − E(1κ1t ))(1κ2

t − E(1κ2t ))

E(1κ1t − E(1κ1

t ))(1κ2t − E(1κ2

t )) E(1κ2t − E(1κ2

t ))(1κ2t − E(1κ2

t ))

.

The forecast of βt from one sample path is then given by

βT+1 = βT + µ + CZT+1. (6)

2.5. Other possible approaches

Despite the fact that this method has been widely used toproject future mortality rates, it is questionable that whether thebivariate random walk assumption or at least the use of a singlerate of drift is appropriate. Other approaches to the estimationand forecasting of CBD model have been discussed by Sweeting(2011) and Cannon (2009). Instead of the bivariate random walkwith drift process, Sweeting claimed that the coefficients in themodel have deterministic trends which will change periodically.Statistical techniques were applied to determine the significantchange points in each trend. Historical frequency of change canthen be used to project future mortality experience (Sweeting,2011). Cannon (2009) generalized the dynamic process of thetwo time factors κ1

t and κ2t . He considered three possibilities,

including the bivariate random walk process, the deterministictrends process proposed by Sweeting (2011) and the case that thetwo factors κ1

t and κ2t are cointegrated (Cannon, 2009).

None of the above methods take into account the possibility ofsmoothness in the estimates. In this paper we express the model

as a time-varying coefficient panel model and thus consider thepossibility that κ1

t and κ2t are actually smooth functions of time.

Semiparametric varying coefficient models have been applied tomany aspects in social sciences to analyze time series data andcross-sectional data.More recently, panel data, or longitudinal datahave become widely available due to the increasing capacity tohandle large data set. The use of semiparametric varying coefficientmodel can also be extended to two-dimensional data sets, forexample the interval censored data in medical studies and thenational climate data set (Shao et al., 2014; Li et al., 2011). Sincemortality data can clearly be considered as a two dimensionaldata set it seems appropriate that we might consider applyingthe semiparametrics panel data approach to mortality. To ourknowledge, there has not been any investigations on mortalitydata using this type of approach. This is the starting point of ourresearch.

3. A semiparametric varying coefficient panel model

The general form of univariate semi-parametric varying coeffi-cient model was first proposed by Hastie and Tibshirani (1993). Inorder to model mortality rates, we re-express model (3) as a time-varying coefficient panel model. Let i = x − a be the age groupfrom 1 to N , with a + 1 and a + N representing the minimum ageand the maximum age in that range. For i ∈ [1,N] and t ∈ [1, T ],define

• Yit = logit(qx,t).• Xi =

1

x − x

.

• βt =

κ1t

κ2t

where κ1

t and κ2t are assumed to be smooth

functions of t .

We have

Yit = logit(qx,t) = κ1t + κ2

t (x − x) = X ′

iβt . (7)

In this way we can use estimation methods from the timevarying coefficient model literature to estimate parameters in themortality rate model.

3.1. Local linear estimation

To fit the model we use local linear estimation methods. Theintuition behind local linear estimation is to assume that βt is alinear function of time in the neighborhood of t . Therefore, foreach value of t , we could provide an estimate of βt by fitting astraight line based on the local information. The amount of localinformation used to construct the linear function is determinedby a bandwidth h and the corresponding weights for these localinformation are set by a kernel function K .

Following the notation and assumptions from recent study oftime-varying coefficients by Robinson (1989) and Cai (2007), wedefine

βt = β(τ), where τ = t/T and t ∈ [1, T ]. (8)

Assuming β(τ) has continuous derivatives up to second order,by Taylor expansion: β(τ) = β(τ0) + β(1)(τ0)(τ − τ0) + O[(τ −

τ0)2] ≈ β(τ0)+β(1)(τ0)(τ −τ0), where τ0 ∈ [0, 1]. Thus themodel

can be approximated by

Yit = X ′

iβ(τ) ≈ X ′

i [β(τ0) + β(1)(τ0)(τ − τ0)]. (9)

We find estimators of the time-varying coefficients at τ0 byminimizing the following weighted sum of squares:

Ni=1

Tt=1

{Yit − X ′

i [β(τ0) + β(1)(τ0)(τ − τ0)]}2Kh(τ − τ0), (10)

H. Li et al. / Insurance: Mathematics and Economics 61 (2015) 264–270 267

where K is the kernel function and Kh(u) = h−1K(u/h). We alsoassume that as T → ∞, h → 0 and Th → ∞ for asymptoticproperties of the estimator.

To derive thematrix-form expression of the local linear estima-tor, we further define the following terms:

• Y = (Y ′

1, . . . , Y′

N)′, where Yi = (Yi1, . . . , YiT )′ for i = 1, 2,

. . . ,N .• For given τ0 ∈ [0, 1], define

M(τ0) = (M ′

1(τ0), . . . ,M′

N(τ0))′

withMi(τ0) =

X ′

i X ′

i

1T

− τ0

...

...

X ′

i X ′

i

TT

− τ0

.

• For given τ0 ∈ [0, 1], defineW (τ0) to be a T × T diagonal matrix with the tth diagonalelement equal to Kh(

tT −τ0). W (τ0) = IN ⊗W (τ0)where IN is an

N × N identity matrix and ⊗ denotes the Kronecker product3.

We express Eq. (10) in a matrix form as

{Y − M(τ0)(β′(τ0), β

(1)′(τ0))′}′

× W (τ0){Y − M(τ0)(β′(τ0), β

(1)′(τ0))′}. (11)

Minimizing Eq. (11) with respect to (β ′(τ0), β(1)′(τ0))

′, weobtain the estimate of β(τ0) as a weighted least square estimate

β(τ0) = [I2, 02][M ′(τ0)W (τ0)M(τ0)]−1M ′(τ0)W (τ0)Y , (12)

where I2 is a 2 × 2 identity matrix and 02 is a 2 × 2 null matrix.

3.2. Bandwidth selection

There are several choices for the kernel function, here we adoptthe Epanechnikov kernel function:

K(u) = 0.75(1 − u2)I(|u| ≤ 1). (13)

The performance of a local linear estimator is mainly deter-mined by its bandwidth rather than the form of kernel function.Therefore, the fitting result of the proposed method will dependhighly on the choice of h. In our analysis, the bandwidth is selectedby the following leave-one-out cross-validation criterion:

hopt = arg minh

CV (h), (14)

where

CV (h) =1NT

Ni=1

Tt=1

[Yit − X ′

i β(τ )(−τ)]2, (15)

and β(τ )(−τ) is calculated by (12) with the tth diagonal element ofW (τ ) equal to 0.

3.3. Local linear forecasting

As mentioned in Section 3.1, the local linear estimator β(τ0)can be evaluated at any τ0 ∈ [0, 1]. In order to do a 1-year-ahead

3 For an explanation of the Kronecker product, see Hayashi (2000). Econometrics.Princeton University Press, Princeton.

forecast, we redefine the following terms:

• βt = β(τ), where τ = t/(T + 1) and t ∈ [1, T ].• For given τ0 ∈ [0, 1], define

M(τ0) = (M ′

1(τ0), . . . ,M′

N(τ0))′

with Mi(τ0) =

X ′

i X ′

i

1

T + 1− τ0

...

...

X ′

i X ′

i

T

T + 1− τ0

.

• For given τ0 ∈ [0, 1], defineW (τ0) to be a T × T diagonal matrix with the tth diagonalelement equal to Kh(

tT+1 − τ0). W (τ0) = IN ⊗ W (τ0) where

IN is an N × N identity matrix.

Following the same procedure described in Sections 3.1 and 3.2,we set τ0 =

T+1T+1 = 1 and obtain the 1-year-ahead forecast at T +1

as

βT+1 = β(τ0) = [I2, 02][M ′(τ0)W (τ0)M(τ0)]−1

× M ′(τ0)W (τ0)Y . (16)

Finally putting the forecast value of YiT+1 into the estimationsample and repeating the process (n − 1) times, we obtain an n-year-ahead forecast for mortality rates.

3.4. Parametric approximation of κ1t and κ2

t

In the estimation process, it is assumed that the time-varyingcoefficients are smooth functions of t . Local linear estimation al-lows us to evaluate κ1

t and κ2t at each given t without knowing the

exact forms of the functions. When connecting those estimates to-gether for different t by plotting κ1

t and κ2t against time, we can see

by visual inspection that polynomials may be used to approximatethe functions of κ1

t and κ2t . Define an nth order polynomial as

Pn(t) = P ′

n(τ ) = a0 + a1τ + a2τ 2+ a3τ 3

+ · · · + anτ n,

where τ = t/T and t ∈ [1, T ]. (17)

We approximate κ1t and κ2

t by Pm1(t) and Pm2(t) respectively.Model (3) can be rewritten as,

logit(qx,t) = Pm1(t) + Pm2(t)(x − x). (18)

We use ordinary least squares (OLS) method to estimate coeffi-cients of Pm1(t) and Pm2(t) and select the optimal values ofm1 andm2 using the Bayesian information criterion (BIC). In this paper wefollow the BIC procedure used byHayashi (2000)whichwill choosethe values ofm1 andm2 that minimize

log(SSRm1m2/NT ) + (m1 + m2 + 2) log(NT )/NT , (19)

where SSRm1m2 is the sum of squared residuals for polynomials ofdegreem1 and m2.

To create an n-year-ahead mortality forecast using thisapproach, we redefine Pn(t) as

Pn(t) = P ′

n(τ ), where τ = t/(T + n) and t ∈ [1, T ]. (20)

Repeating the previous steps to determine the optimal ordersfor polynomials and estimate the coefficients using OLS method,we then place values of t =

T+1T+n ,

T+2T+n , . . . ,

T+nT+n into Eq. (18), and

get the n-year-ahead forecast for future mortality rates.

4. Data

As defined in Section 2, the central mortality rate mx,t isestimated from deaths and exposures data. Number of deaths is

268 H. Li et al. / Insurance: Mathematics and Economics 61 (2015) 264–270

Fig. 1. Estimated values of (a) κ1t , (b) κ2

t from local linear approach, (c) κ1t and (d) κ2

t from MLE approach for GB male from 1950 to 2009.

generally considered to be accurate because it is an actual numberrecorded from death certificates. The exposures data Ex,t is lessaccurate being estimated from census data. Since census data isonly available at 5 or 10 year intervals, it can only give us a roughapproximation to the real exposure. The deaths and exposuresdata used in this paper are downloaded from the HumanMortalityDatabase (HMD).4

There has been a considerable amount of study on themortalityexperience of England and Wales and the US, see in Cairns et al.(2006, 2009). In this paper, we will apply the proposedmethods tomalemortality rates from awider range of countries. The countriesincluded in the analysis are: Great Britain (GB), United States (US),Australia (AUS), Netherlands (NL), Japan (JAP), France (FR) andSpain (SP). To ensure the reliability of our data, we only considerthe post-war mortality rates from 1950 to 2009. Longer historicaldata is also available, but data quality issues might occur duringthe period of the two World Wars (1912–1920 and 1939–1945).Further, since the CBD model was specifically designed for olderages we use the 50–89 age range.

5. Empirical results and analysis

This section starts with a comparison of the fit quality of theproposed local linear estimation, OLS estimation for parametricapproximation and the MLE approach. We then assess theforecasting performance of the two new methods introduced inSections 3.3 and 3.4 together with the bivariate random walkmethod described in Section 2.4 by looking at the 5-year-aheadforecasting results from 2005 to 2009.

5.1. Fit quality comparison

To empirically compare the fit quality of the three methods,the data sets described in Section 4 are used to fit the CBD model.Fig. 1 plots the estimates of κ1

t and κ2t fromMLE approach and local

linear approach for GBmalemortality data.We see from the graphsthat the overall shapes of the estimates from the two methods are

4 The HMDmortality database can be found at http://www.mortality.org.

very similar. Numerically, the values of estimates are also quiteclose. Themain difference is that: estimates of κ1

t and κ2t fromMLE

approach seem to follow a stochastic process while those from thelocal linear approach look like smooth functions of t . The plots ofestimates for the rest of the countries included in the analysis showsimilar results and are available upon request.

Following the investigation of O’Hare and Li (2012) amongothers, we adopt and define the following notation of fittingmeasures:

The average error (E1), whichmeasures the overall bias, is givenby

E1 =1NT

x

t

mx,t − mx,t

mx,t. (21)

The absolute average error (E2), also known as the MeanAverage Percentage Error (MAPE), which measures the magnitudeof the deviance, is given by

E2 =1NT

x

t

|mx,t − mx,t |

mx,t. (22)

The standard deviation of error (E3), which is a measure todetect large deviance, is given by

E3 =

1NT

x

t

mx,t − mx,t

mx,t

2

. (23)

It can be seen from Table 1 that, comparing with the MLEmethod, both local linear estimation and the parametric estimationgive a much smaller absolute value of E1. This implies thatthe estimates are less biased. On the other hand, the values ofE2 and E3 are slightly higher. Overall, estimates from the locallinear estimation and parametric estimation provide comparablefitting results with those from the MLE. The fitting of local linearestimation is generally better than the polynomial estimation. Thisresult is not surprising because kernel smoothing can be viewedas an approximation with polynomials of infinite order. By BICcriteria, we can see that for a majority of countries included in theanalysis, the optimal order for κ1

t is 3 and the optimal order for κ2t

is 4. The maximum order of the polynomials is 6 from GB and US

H. Li et al. / Insurance: Mathematics and Economics 61 (2015) 264–270 269

Table 1Fitting results of the CBD model for ages 50–89 from 1950 to 2009.

Local linear Parametric Original MLEh E1 (%) E2 (%) E3 (%) m1 m2 E1 (%) E2 (%) E3 (%) E1 (%) E2 (%) E3 (%)

GB 0.145 −0.16 4.22 5.27 3 6 0.12 4.28 5.40 1.01 3.95 5.40US 0.055 0.07 2.86 3.79 3 6 0.11 3.27 4.24 −0.33 2.78 3.79AUS 0.105 0.05 4.20 5.36 3 4 0.16 4.61 5.89 0.49 3.79 5.18NL 0.065 0.07 4.25 5.29 3 5 0.15 4.43 5.57 0.36 4.07 5.36JAP 0.105 0.21 4.74 6.01 3 4 0.24 5.12 6.55 −0.42 4.21 5.68FR 0.135 0.22 6.33 7.98 2 4 0.40 6.43 8.14 −1.13 6.12 7.83SP 0.145 0.19 5.07 6.76 2 4 0.27 5.31 6.99 −0.55 4.50 6.19

Table 2Local linear fitting results for GB male mortality data under different h.

h E1 (%) E2 (%) E3 (%)

0.01 0.091 3.86 4.910.02 0.057 3.94 5.000.04 0.067 4.10 5.180.08 0.057 4.18 5.250.16 −0.22 4.23 5.29

data. Further research will look at why such polynomials appear tobe the best fitting models.

As mentioned before, the bandwidth h plays an important partin local linear estimation. The choice of bandwidth should dependon the purpose of smoothing and thus can be subjective (Härdle,1990). The smaller the value of h, the better fitting quality wewill have. However, there is always a trade-off between varianceand bias. The improvement in fitting quality by choosing a smallerh would not bring any benefit to modeling especially on theforecasting perspective. Take GB mortality data as an example,Table 2 shows the fitting results for local linear estimation underdifferent choices of h.

As h approaches 0, the local linear estimation becomes acolumn-by-column OLS estimation. In terms of fitting quality,column-by-column OLS estimation would be the best estimationmethod but it might capture too much noise in the data.

5.2. Forecasting performance

Good forecast ability is an essential feature formortalitymodelsas the projection of future mortality rates will be used to priceinsurance products. This section shows the forecasting resultsof the proposed methods. Following the study of Dowd et al.(2010), backtesting of the model has been carried out to assessthe forecasting performance. The CBD model is fitted using eachmethod to mortality data of all seven countries from 1950 to 2004,we thenproduce the 5-year-ahead forecasting results and comparethese using E1, E2 and E3 measures with the actual observationsfor the period 2005–2009. Forecasts using all three methods arepresented in Table 3.

As mentioned in Section 5.1, the three measures E1, E2 andE3 focus on different aspects of goodness of fit. In terms offorecasting performance, from a pricing actuary’s point of view, E1could be very important as bias could lead to underestimated oroverestimated mortality rates and the insurance company couldincur a significant loss from this. If the aim of future mortalityprojection is to ensure a high degree of overall accuracy of theforecast, then E2 would be the measure to focus on. E3 detectslarge deviance betweenmortality forecast and the actual mortalityexperience, and thus a large value of E3 could be an indicator ofanomalies in the forecast. In our analysis, large deviance seems notto be a problem.

We see from Table 3 that, overall, the forecasting results fromlocal linear forecast and parametric forecast are comparable toother model results. The local linear forecast gives the most

accurate forecast in 6 out of 7 countries on all three measures. Forexample, the local linear mortality forecast for GB is around 2%points better using the E1, E2 and E3 measures when comparedwith the bivariate random walk forecast. For AUS, the local linearforecast ismarginally better on E1, and between 1% and 1.5% pointsbetter on E2 and E3 measures. Only in the case of the US, doesthe performance of the local linear show some degree of biason E1 measure and a poorer performance on the E3 measure. Itwould be worthwhile to undertake some further investigations tounderstand the reason behind this. The forecasting performanceof the parametric method is slightly poorer than local linearforecasting but seems to be comparable with bivariate randomwalk forecast on E1 measure and better than bivariate randomwalk forecast on E2 and E3 measure. Fig. 2 compares the 5-year-ahead mortality forecast from the three methods with the actualmortality experience for GB male aged 50, 60, 70 and 80. It can beseen from these plots that the local linear forecast and parametricforecast aremore accurate than the bivariate randomwalk forecastin most of the cases, which is consistent with our conclusion basedon the values of the three measures. The equivalent forecastingplots for the rest of the countries are also available upon request.

The reason the local linear forecast method outperforms theother two methods might be due to the fact that it is mainly usinglocal information rather than global information to project futuremortality rates. Based on the formula in Section 3.3, the bandwidthhwill set greater weights on more recent data and lighter weightson historical data in the forecasting procedure. Obviously recentmortality experience will have a greater predictive power overpast mortality experience. On the other hand, both parametricforecast method and bivariate random walk forecast method setequal weights to past and recent information for future mortalityforecast. The estimation of drift factor µ mentioned earlier inSection 2.4will highly depend on the starting time and ending timeof the investigation. Different observation periods could lead toquantitative differences in µ and thus lead to unstable forecastingperformance of the method. Therefore the bivariate random walkmethod would need a relatively longer period of mortality datato determine the drift factor and would assume the long-termpattern in the past will continue in the future which is sometimesnot realistic. As a result this method might take less relevantinformation into account and thus the local linear forecast methodwould give more accurate forecast results.

6. Conclusion

In this paper we proposed a local linear kernel method toestimate the CBD model using mortality data from a rangeof developed countries. The ‘‘overdispersion’’ phenomenon inmortality data has suggested that, the Poisson assumption onnumber of deaths might not be valid. This fact would make theproposed semiparametric approach more attractive compared tothe existing estimation methods in the literature.

The coefficients in the CBD model are treated as smoothfunctions of time instead of stochastic processes. After fitting the

270 H. Li et al. / Insurance: Mathematics and Economics 61 (2015) 264–270

Table 35-year-ahead forecasting results of the CBD model for ages 50–89 from 2005 to 2009.

Local linear (%) Parametric (%) Random walk (%)E1 E2 E3 E1 E2 E3 E1 E2 E3

GB −0.01 4.69 5.64 −3.79 5.32 7.40 2.76 6.10 7.51US −5.24 6.89 9.35 1.81 7.20 8.80 −0.71 6.91 8.44AUS −0.56 6.50 8.14 −4.96 7.24 9.39 0.65 7.48 9.68NL 1.52 5.62 6.85 7.15 7.98 9.82 7.80 8.92 10.89JAP −1.48 6.96 8.29 8.74 10.93 13.30 −3.66 7.43 8.64FR 0.64 11.45 13.04 −0.84 11.66 13.38 −3.18 12.16 14.58SP 1.53 8.21 9.44 1.94 8.91 10.38 −2.02 9.10 11.19

Fig. 2. Mortality rates from 1990 with 5-year-ahead forecast from 2005 to 2009 for GB male aged (a) 50, (b) 60, (c) 70 and (d) 80.

model using local linear estimation we also give approximatedfunctional forms of the coefficients. Comparing the fitting qualitywith the MLE method using E1, E2 and E3 measures, we concludethat the proposed estimation methods provide comparable fittingresults but without the need for the additional assumption onnumber of deaths. New forecasting methods have also beenintroduced. Based on the 5-year-ahead forecasting results from2005 to 2009, we have shown that our forecasting methods givemore accurate predictions compared to the bivariate randomwalkmethod in the current literature.

Further research will consider the question of the order of thepolynomials used to fit the smooth functions of κ1

t and κ2t including

a wider range of countries and genders. In addition, the approachwe have implemented here in the case of the CBD model can beeasily adapted to fit other larger time series models for mortality,for example the Plat (2009) or O’Hare and Li (2012) models. Weaim to consider these in future work.

Acknowledgments

We are indebted to colleagues from the Department of Econo-metrics and Business Statistics, Monash University for valuablefeedback received.

References

Booth, H., Maindonald, J., Smith, L., 2002. Applying Lee–Carter under conditions ofmortality decline. Popul. Stud. 56, 325–326.

Brouhns, N., Denuit, M., Vermunt, J.K., 2002. A Poisson log-bilinear approach to theconstruction of projected lifetables. Insurance Math. Econom. 31 (3), 373–393.

Cai, Z., 2007. Trending time-varying coefficient time series models with seriallycorrelated errors. J. Econometrics 136, 163–188.

Cairns, A.J.G., Blake, D., Dowd, K., 2006. A two-factor model for stochastic mortalitywith parameter uncertainty: theory and calibration. J. Risk Insurance 73,687–718.

Cairns, A.J.G., Blake, D., Dowd, K., Coughlan, G.D., Epstein, D., Ong, A., Balevich, I.,2009. A quantitative comparison of stochasticmortalitymodels using data fromEngland &Wales and the United States. N. Am. Actuar. J. 13 (1), 1–35.

Cannon, E., 2009. Estimation and pricing with the Cairns–Blake–Dowd model ofmortality. Working Paper, University of Verona.

De Jong, P., Tickle, L., 2006. Extending the Lee–Cartermodel of mortality projection.Math. Popul. Stud. 13, 1–18.

Dickson, D.C.M., Hardy, M.R., Waters, H.R., 2009. Actuarial Mathematics for LifeContingent Risks. Cambridge University Press, London.

Dowd, K., Cairns, A.J.G., Blake, D., Coughlan, G.D., Epstein, D., Khalaf-Allah, M., 2010.Evaluating the goodness of fit of stochastic mortality models. Insurance Math.Econom. 47, 255–265.

Härdle, W., 1990. Applied Nonparametric Regression. Cambridge University Press,London.

Hastie, T., Tibshirani, R., 1993. Varying-coefficient models. J. R. Stat. Soc. Ser. B 55(4), 757–796.

Hayashi, F., 2000. Econometrics. Princeton University Press, Princeton.Lee, R.D., Carter, L.R., 1992. Modeling and forecasting US mortality. J. Amer. Statist.

Assoc. 87, 659–675.Li, D., Chen, J., Gao, J., 2011. Non-parametric time-varying coefficient panel data

models with fixed effects. Econom. J. 14, 387–408.O’Hare, C., Li, Y., 2012. Explaining youngmortality. InsuranceMath. Econom. 50 (1),

12–25.Plat, R., 2009. On stochastic mortality modeling. Insurance Math. Econom. 45 (3),

393–404.Renshaw, A.E., Haberman, S., 2006. A cohort-based extension to the Lee–Carter

model for mortality reduction factors. Insurance Math. Econom. 38, 556–570.Robinson, P.M., 1989. Nonparametric estimation of time-varying parameters.

In: Hackl, P. (Ed.), Statistical Analysis and Forecasting of Economics StructuralChange. Springer, Berlin, pp. 253–264.

Shao, F., Li, J., Ma, S., Lee, M., 2014. Semiparametric varying-coefficient model forinterval censored data with a cured proportion. Stat. Med. 33 (10), 1700–1712.

Sweeting, P.J., 2011. A trend-change extension of the Cairns–Blake–Dowd model.Ann. Actuar. Sci. 5, 143–162.