Forecasting the Inevitable Consequence of Life - Student ...

59
Forecasting the Inevitable Consequence of Life: On the Implications of the Choice of a Mortality Model for the Pricing of Annuities Shady el-Gewily January 7, 2018

Transcript of Forecasting the Inevitable Consequence of Life - Student ...

Forecasting the Inevitable Consequence of Life:On the Implications of the Choice of a Mortality Model

for the Pricing of Annuities

Shady el-Gewily

January 7, 2018

Master’s Thesis Econometrics, Operations Research and Actuarial

Studies

Supervisors: Prof. Dr. F. Janssen, Dr. A. Bardoutsos, Dr. C. Praagman

2

Forecasting the Inevitable Consequence of Life:

On the Implications of the Choice of a Mortality Model for the

Pricing of Annuities

Shady el-Gewily

AbstractUnderestimation of future longevity can have adverse consequences for the providersof life benefits, such as an annuity. Therefore, mortality models that produce highquality forecasts and that can appropriately quantify the uncertainty associated withforecasts are important for the financial health of such institutions. The pricing ofan annuity requires accurate mortality forecasts and associated levels of uncertainty.There is a wide array of mortality models available, which can produce substantiallydifferent point forecasts and levels of uncertainty associated with forecasts. The mainobjective of this thesis is to evaluate to what extent model choice affects the price ofannuities and how the similarities and differences can be attributed.

Age-specific all-cause mortality rates for the England and Wales male populationby single year of age are obtained from the Human Mortality Database. The modelsemployed in this thesis are the Lee-Carter, Cairns-Blake-Dowd M7 and Hyndman-Ullah models and a two-dimensional kernel regression model proposed by Li et al.(2016). These models are calibrated on the England and Wales male mortality expe-rience encompassing the calendar years 1961-2004 and the pensioner ages 60-89. Thecalibrated models are first compared on the basis of a set of selected qualitative andquantitative criteria that reveal relative merits and idiosyncratic characteristics ofthe models. The models are then used to estimate the density of the random presentvalue of several annuities. The estimated density functions are compared and simi-larities and differences analyzed with the insights derived from the model comparison.

The mortality models produce economically significant differences in annuity prices ofup to 28.6%, 24.2% and 30% when pricing using the mean, 90th and 95th percentile,respectively. These differences can be attributed to idiosyncrasies of the models re-lated to mortality forecasts and prediction intervals. As annuity prices become moredependent on long-term forecasts, differences in annuity prices get larger, which isdue to diverging forecasts between models. All four models exhibit implausible fea-tures either in point forecasts or prediction intervals and there is a general tendencyof the selected mortality models to overestimate future mortality rates. Annuityproviders are recommended to use conservative annuity prices, based on the 90th or95th percentile, to avoid underestimating future liabilities. The two-dimensional ker-nel model is unsuitable to calculate such prices and is rejected on this basis despitehaving best goodness-of-fit and out-of-sample forecasting performance.

Keywords: Mortality models, annuity pricing, density of random present value, modelrisk, longevity risk, qualitative and quantitative comparison.

3

Contents

1 Introduction 5

2 Methods and Data 82.1 Terminology, Notation and Assumption . . . . . . . . . . . . . . . . . . . . 82.2 Mortality Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Lee-Carter under a Poisson Setting . . . . . . . . . . . . . . . . . . 92.2.2 Cairns-Blake-Dowd M7 Extension . . . . . . . . . . . . . . . . . . . 102.2.3 Hyndman and Ullah . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.4 Li, O’hare and Vahid . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.5 Characteristics of the Mortality Models . . . . . . . . . . . . . . . . 142.2.6 Modeling Mortality at Very Old Ages . . . . . . . . . . . . . . . . . 15

2.3 Description of the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4 Research Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Empirical Results 213.1 Model Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Annuity Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Conclusion 43

5 References 46

6 Appendix 52

4

1 Introduction

Studies have revealed a persistent downward trend in mortality rates and near linear im-provements in life expectancy (McDonald et al., 1998; Oeppen and Vaupel, 2002; de Beeret al., 2017). The improvements in longevity, while good for society as a whole, havelead to a range of social, political, economic and regulatory challenges (Barrieu et al.,2012). In particular, improved longevity brings with it increased costs for social securityand health care benefits as well as higher pension liabilities. One of the consequences hasbeen an increased retirement age in some countries, such as the United Kingdom and theNetherlands. The improvements in longevity were underestimated by forecasts producedby researchers in the previous century, as pointed out by e.g., Wong-Fupuy and Haberman(2004) and Pitacco et al. (2009). When improvements in longevity are unanticipated thebalance sheets of public retirement systems, private annuity providers and insurers arethreatened, because they will have to pay out more in social security benefits and pen-sions than expected (International Monetary Fund, 2012). The risk of underestimatinglongevity, longevity risk, is thus an important factor in decision-making regarding pricingand reserving for pensions and life annuities.

Life annuities are an important investment product that offers retirees the opportunity toinsure against the risk of outliving their assets. Annuities involve the transfer of longevityrisk from the policyholder to the annuity provider. A life annuity provides a guaranteedincome for the remainder of the policyholders’s lifetime in exchange for a lump-sum pay-ment. In order to ensure its financial health, the firm that sells the annuity has to ensureit is able to pay out its future liabilities. The firm can protect itself by setting annuityprices sufficiently high, ensure it has adequate financial buffers and cross-hedge or transferpart of the longevity risk to reinsurance or the financial markets. The price of an annuityis directly related to the descriptive statistics (i.e., the expected value, 90th or 95th per-centile) of the random present value of its future cash flows. The random present valueis a function of future survival probabilities, which have to be accurately forecast by anappropriate mortality model.

This need has stimulated the development of mortality models, both deterministic andstochastic in nature. Deterministic mortality models include the classical mortality lawsby Gompertz (1825), Makeham (1860), Heligman and Pollard (1980) and Kannisto (1994).Recent contributions to the deterministic mortality model literature include de Beer andJanssen (2016) and Li et al. (2016). The last 25 years have seen the introduction of awide array of stochastic mortality models. In contrast to deterministic mortality models,stochastic mortality models specify the evolution of mortality over time as a stochasticprocess. Widely known and used stochastic mortality models are the Lee-Carter modeland its extensions (Lee and Carter (1992); Brouhns et al. (2002b); Renshaw and Haber-man (2003, 2006); Hyndman and Ullah (2007); Continuous Mortality Investigation Bureau(2005, 2006)); the family of Cairns-Blake-Dowd models (Cairns et al. (2006, 2009)); the P-splines model (Currie et al. (2004); Currie (2006); Continuous Mortality Investigation Bu-reau (2005, 2006)), and the model proposed by Plat (2009).

5

Mortality models vary according to a number of important elements: forecasting method-ology, number of sources of randomness driving mortality improvements at different ages,assumptions of smoothness in the age and time dimensions, inclusion or not of year of birtheffects and the calibration method. As a result, mortality models may produce substantialdifferences in forecasts, which are propagated to the annuity price. A number of studieshave been performed to compare mortality models, both on quantitative (e.g., goodness-of-fit, forecasting performance) and qualitative aspects (e.g., parsimony, transparency, abilityto produce simulated sample paths and forecast percentiles, plausibility of forecasts andprediction intervals).

Dowd et al. (2010a,b) carried out a range of formal, out-of-sample backtesting and goodness-of-fit tests for various mortality models. Cairns et al. (2009) compared eight stochasticmortality models based on their general characteristics and ability to explain historicalpatterns of mortality. Cairns et al. (2011) introduced a number of additional qualitativecriteria focused on the plausibility of forecasts and prediction intervals and compare sixstochastic mortality models. It is argued that the proposed qualitative criteria are impor-tant because a model might perform well in terms of quantitative performance, yet produceforecasts exhibiting features which are clearly implausible. Other comparative studies aree.g., Booth et al. (2006); Hyndman and Ullah (2007); Plat (2009) and Li et al. (2016).There have also been several studies that discuss and evaluate the impact of model choiceon the pricing of annuities. These studies primarily focus on quantitative aspects, see e.g.,Brouhns et al. (2002b); Yang et al. (2010). The exceptions seem to be Cairns et al. (2009)and Cairns et al. (2011), which also attribute differences and similarities using insightsobtained from their proposed qualitative criteria. The criteria put forth by Cairns et al.(2009, 2011) provide a framework to attribute similarities and differences in annuity prices.

The numerous available mortality models and the differences in their characteristics in-spires the research question:

Research Question 1. To what extent does model choice affect the price of annuitiesand how can the similarities and differences be attributed?

To answer the research question, four mortality models are applied to forecast mortalityrates and one-year death probabilities by single year of age for the England and Wales malepopulation. This population is also used by Cairns et al. (2009, 2011), which facilitatescomparison. From the wide range of models available, I selected the Lee-Carter under aPoisson setting (Brouhns et al., 2002b), the Cairns-Blake-Dowd M7 model (Cairns et al.,2009), the Hyndman-Ullah model (Hyndman and Ullah, 2007) and the two-dimensionalkernel model proposed by Li et al. (2016). These models have been selected becauseresearchers have applied these models to forecast mortality for the England and Walesmale population before. But the models differ substantially in terms of their generalcharacteristics, which makes them suitable to answer the research question.

6

Based on the forecasts produced by the selected mortality models, the density function ofthe annuity present value is estimated. Annuity prices are calculated using three commonpricing strategies, namely using the mean and the 90th and 95th percentiles of the presentvalue. The annuity prices will be compared quantitatively to shed light on the economicsignificance of model risk when pricing annuities. Moreover, the qualitative criteria putforth by Cairns et al. (2009) and Cairns et al. (2011) assist in attributing similarities anddifferences between annuity present values produced by different mortality models. Thisnot only sheds light on the extent that annuity prices are similar or different, but also howsuch similarities and differences are generated.

This thesis adds to the previous literature in three ways. To my knowledge, the crite-ria put forth by Cairns et al. (2009) and Cairns et al. (2011) have not been applied to theHyndman and Ullah (2007) and Li et al. (2016) models. Furthermore, in these studies theage effects are not extrapolated to include ages above 89. Given the recent improvementsin longevity, it seems appropriate to include ages 90+ in order to avoid underestimation ofannuity prices. In this thesis, the Kannisto (1994) law of mortality is used to extrapolateage effects to the age range 90-120. Finally, it appears that an implementation of thetwo-dimensional kernel model was not publicly available. An implementation in R is madeavailable alongside this thesis, which facilitates replication and can assist in future researchthat makes use of this model.

The rest of the thesis is structured as follows. Chapter 2 provides a more detailed ac-count of the methods and data used in the empirical study. The empirical results arepresented in Chapter 3. The answer to the research question, the most important insightsand directions for future research are presented in Chapter 4. The Appendix providesmathematical details and directions to replicate the findings in this thesis.

7

2 Methods and Data

Section 2.1 introduces the terminology, notation and assumption for fractional ages anddurations, which are used throughout this thesis. A more formal mathematical treatmentcan be found in Pitacco et al. (2009). Section 2.2 gives a description of the mortalitymodels applied in this thesis. Section 2.3 gives a description of the data used in theempirical analysis. The research method used to answer the research question is presentedin Section 2.4.

2.1 Terminology, Notation and Assumption

The one-year death probability qx,t denotes the probability that an individual of exact agex at time t, dies before reaching age x + 1, where x, t ∈ R. The corresponding one-yearsurvival probability is px,t = 1 − qx,t. The force of mortality µx,t represents the instanta-neous rate of mortality at a given age x and time t. The behaviour of the force of mortalityon the age interval (x, x + 1) and the time interval (t, t + 1) can be summarized by the(central) mortality rate mx,t at age x and time t. The central mortality rate is the weightedarithmetic mean of the force of mortality over the area defined by the intervals (x, x + 1)and (t, t+ 1), the weighting function being the probability of being alive at age x+ u andtime t+ s, 0 < u, s ≤ 1.

Mortality statistics are generally published for integer ages and calendar years only. There-fore, an assumption for fractional values of x and t is needed. Throughout this thesis, apiece-wise constant force of mortality is assumed, which is frequently adopted in actuarialcalculations. This assumption entails that the force of mortality remains constant oversquares of age and time, but allowed to vary from one square to the next. Specifically, weassume that

Assumption 1. µx+ξ1,t+ξ2 = µx,t for 0 ≤ ξ1 < 1 and 0 ≤ ξ2 < 1.

Under assumption 1, we have for integer age x and calendar year t that

qx,t = 1− exp[−µx,t], and µx,t = mx,t,

see Pitacco et al. (2009) for a proof. From now on, age x and calendar year t are meantto be integer values. The fact that µx,t = mx,t under Assumption (1) is useful becausemortality rates are much easier to estimate from observed mortality statistics than forcesof mortality.

The initial exposure to risk refers to the total number of individuals aged x alive at thestart of the calendar year t. The central exposure-to-risk at age x last birthday duringyear t, denoted by CETRx,t, refers to the total time lived by people aged x last birthday incalendar year t. Empirical mortality rates can be estimated from observed data as follows:

mx,t =dx,t

CETRx,t

,

where dx,t are recorded death numbers that were aged x last birthday in calendar year t.

8

2.2 Mortality Models

2.2.1 Lee-Carter under a Poisson Setting

The method proposed by Lee and Carter (1992), henceforth LC, assumes the followinglog-bilinear model for the age-specific force of mortality µx,t,

log µx,t = αx + βxκt. (1)

The age effects are captured by the sequences αx and βx, whereas the calendar year effectsare captured by the single time index κt. The time index κt is modelled as a multiplicativefactor and estimated together with the βx. The age effect αx is assumed to be constantover time. Since there are no known covariates on the right hand side of (1), the param-eters cannot be obtained by using ordinary regression techniques. Lee and Carter (1992)proposed to obtain a least-squares solution by using the first element of a singular valuedecomposition (SVD) in combination with a set of identifiability constraints. Once theparameters have been estimated, a time-series model for κt is specified and forecasts ofmortality rates can be obtained. The SVD calibration method requires the assumptionthat the random errors are homoskedastic. This is an unrealistic assumption, becausethe logarithm of the observed mortality rates is much more variable at older ages than atyounger ages due to the much smaller number of exposure-to-risk, and thus death numbers,observed at high ages.

To allow for heteroskedastic errors, alternative calibration techniques have been inves-tigated in e.g., Brouhns et al. (2002a,b). Instead of an additive error structure, deathsare assumed to be independently distributed as Poisson random variables. The Poissonassumption on the random number of deaths at age x in calendar year t, Dx,t, implies

Dx,t ∼ Poisson(CETRx,tµx,t

).

The method assumes the same log-billinear model for the force of mortality µx,t specifiedin equation (1). By virtue of Assumption (1) the model can be calibrated using mortalityrates mx,t.

The parameters αx, βx and κt are calibrated by Maximum Likelihood Estimation (MLE).The parameters are only identifiable up to a set of parameter constraints, which need tobe specified. Following Lee and Carter (1992), the following constraints are imposed:∑

x

βx = 0 and∑t

κt = 0.

These constraints imply that the sequence of αx’s describes the time averages of the age-pattern of log mx,t for ages x. The time index κt describes the change in the level ofmortality rates over time. The βx profile describes which mortality rates decline rapidlyand which decline slowly over time. The age effects are non-parametric and estimatedfrom historical data. This implies that the age effects, in particular the βx profile, do notexhibit a smooth progression with age. As a result, estimated mortality rates generally donot progress smoothly with age.

9

To obtain forecast logarithms of age-specific mortality rates, an appropriate ARIMA modelis specified for the calibrated calendar year effects κt. Following Lee and Carter (1992),the time index κt is modelled as a random walk with drift. Then for future calendar years,point forecasts and prediction intervals of logarithms of mortality rates can be obtainedby applying the forecasting procedure described in the Appendix. For ease of notation, weuse the mathematical notation mx,t for both the estimated mortality rates and the forecastmortality rates.

2.2.2 Cairns-Blake-Dowd M7 Extension

The Cairns-Blake-Dowd model (henceforth, CBD) proposed by Cairns et al. (2006) modelschanges in logit-transformed age-specific one-year death probabilities. It is appropriate forpensioner ages only, say x > 60. The model is based on the empirical observation thatthe logit-transformed death probabilities, log qx,t

1−qx,t , are reasonably linear in x for this age

range. In contrast to the LC method, the CBD model treats age as a continuous covariateand incorporates two time indices for the calendar year effects. Empirical studies haveshown that death probabilities have an imperfect correlation at different ages from oneyear to the next. A model coherent with this observation requires a minimum of two timeindices.

Sometimes the logit-transformed death probabilities exhibit a slight curvature after theretirement age. This curvature can be captured by including a quadratic age term. Forsome countries, individuals with the same year of birth experience share similar mortalitypatterns (see e.g., Willets (2004), MacMinn et al. (2005) and Richards et al. (2006)). Forthese countries, age and calendar year effects are not sufficient and cohort effects must beincorporated. The CBD model can be generalized to include both a quadratic age termand a cohort effect, see Cairns et al. (2009). This specification (henceforth, CBD M7)assumes the following model for age-specific death probabilities:

logit qx,t = κ(1)t + κ

(2)t (x− x) + κ

(3)t

((x− x)2 − σ2

x

)+ γc, (2)

where κ(1)t , κ

(2)t and κ

(3)t are the calendar year effects, γc is the cohort effect and σ2

x is themean value of (x − x)2. The age effects assume a functional form and do not have tobe estimated from historical data. This implies that the model produces estimated andforecast one-year death probabilities that progress smoothly with age.

The CBD M7 model can be calibrated using a variety of statistical methods, such asOLS or Maximum Likelihood, see e.g., Pitacco et al. (2009). In this thesis, the approachof Villegas et al. (2015) is followed and death counts Dx,t are assumed to be independentand distributed as a Binomial random variable,

Dx,t ∼ Binomial(IETRx,tqx,t),

where IETRx,t are the initial exposures-to-risk.

10

Following Villegas et al. (2015), the initial exposures-to-risk are obtained by using the ap-proximation IETRx,t ≈ CETRx,t + 0.5dx,t, with dx,t the observed death counts. The model

is calibrated on empirical one-year death probabilities, calculated as qx,t = dx,tIETRx,t

.

The parameters are only identified up to a transformation. Cairns et al. (2009) suggestedthe following parameter constraints to ensure identifiability:∑

c

γc = 0 and∑c

cγc = 0 and∑c

c2γc = 0,

where the summation is over all observed cohorts c. These constraints ensure that thecohort effect γc fluctuates around zero and has no detectable linear or quadratic trend.

Forecasts of the future death probabilities qx,t+h can be obtained by specifying time-series

dynamics for κ(1)t , κ

(2)t , κ

(3)t and γc. The calibrated calendar year effects κ

(1)t , κ

(2)t and κ

(3)t

are modelled as a multivariate random walk with drift, which is the standard approachin the literature, see e.g., Cairns et al. (2006, 2011) and Haberman and Renshaw (2011).Following Cairns et al. (2011) an AR(1) processs with a constant is used for the cohortindex, which is independent from the dynamics of the time indices. The assumption of in-dependence of the dynamics of the period and cohort indices follows previous studies (e.g.,Renshaw and Haberman (2006) and Continuous Mortality Investigation Bureau (2006)).Sample paths of future mortality rates can then be simulated and prediction intervals canbe obtained. Details of the forecasting procedure are presented in the Appendix. For easeof notation, the mathematical notation qx,t is used for both the estimated one-year deathprobabilities and the forecast one-year death probabilities.

2.2.3 Hyndman and Ullah

Hyndman and Ullah (2007) proposed a method (henceforth, HU) that extends the LCmethodology in two important ways. First, it allows for more than one principle compo-nent. Second, it produces mortality rates that exhibit a smooth progression over age. Themotivation for smoothing in actuarial applications is as follows. Empirical mortality ratesare subject to sampling errors, that do not necessarily reveal systematic features of therisk that an annuity provider faces. When no smoothness over age is imposed, forecastmortality rates may show an erratic variation across ages that is passed on to price listsand balance sheets. Such irregularities in the price lists and balance sheets are consideredundesirable (Pitacco et al., 2009). To reduce sampling error and produce mortality ratesthat show a smooth progression with age, some actuaries prefer models that pre-smoothempirical mortality rates before model calibration. The Hyndman and Ullah (2007) is onesuch model.

11

Essentially, the HU method assumes that there is some underlying smooth function ft(x)that is observed with error, relating the logarithms of mortality rates to age. To obtainft(x), age-specific mortality rates mx,t are first smoothed using weighted penalized regres-sion splines (Wood, 2003), for each calendar year t separately. This smoothing method isunivariate and non-parametric in nature and it allows for monotonicity constraints. Theresulting mortality rates progress smoothly over age and are monotonically increasing aftersome age threshold.

Once the smoothed mortality rates ft(x) have been obtained, they are decomposed us-ing a basis function expansion using the following model:

log(fx,t) = θx +K∑k=1

βt,kφx,k + εx,t,

where θx is a measure of location of fx,t, {φx,k} is a set of orthonormal basis functionswith corresponding coefficients {βt,k} and εx,t ∼ N (0, v(x)). Robust principal components(Robust PCA) is used to estimate the age effects θx and {φx,k}, to avoid difficulties withoutlying years. The calendar year effects βt,k are not estimated robust, so that any out-lying years will be modelled by outliers in the time series. There are several methods toobtain robust estimates of {φx,k}, as described in Hyndman and Ullah (2007). The hybridalgorithm proposed by Hyndman and Ullah (2007) is employed, which combines the bestfeatures of the available methods, making it an efficient and robust method for obtainingthe basis functions {φx,k}. The model parameters are fully identified without imposingparameter constraints.

Hyndman and Ullah (2007) proposed to select the number of basis functions K by mini-mizing the Integrated Squared Forecast Error (ISFE). Since then, Hyndman et al. (2017)have recommended to choose a K that is more than enough. The argument is that fore-casting performance is not affected by using too many basis functions, but using too fewbasis functions has adverse effects on forecasting performance.

Because the calibration method yields basis functions {φx,k} that are mutually uncor-related, time-series for {βt,k} can be modelled independently. This greatly simplifies the

forecasting procedure. The calibrated calendar year effects {βt,k} are modelled as indepen-dent robust ARIMA models, which allows the fitted ARIMA models to contain outliersof various types so that unusual observations do not contaminate the forecasts (Chen andLiu, 1993). The ARIMA models are chosen optimally on the basis of the corrected ver-sion of the Akaike Information Criterion (AICc). Point forecasts, simulated sample pathsand prediction intervals of future mortality rates mx,t can be obtained using the approachoutlined in the Appendix.

12

2.2.4 Li, O’hare and Vahid

Li et al. (2016) proposed a two-dimensional local constant kernel regression (henceforth,2D KS) for the logarithms of age-specific mortality rates. This model allows for smoothnessover age, time and cohort. The assumed model is as follows:

log(mx,t) = Γx,t + εx,t,

where Γx,t is an unknown smooth function of age x and time t and εx,t random errors.

Kernel regression is a non-parametric regression method, which means that in contrast toparametric regression methods, no functional form is specified a priori. The idea behindlocal constant kernel regression is that Γx,t is assumed continuous and can be approximatedby a constant over a local neighbourhood around (x, t). This boils down to the estimateΓx,t being a weighted average of the observed values of log(mx,t) in a local neighbourhoodaround (x, t).

To obtain the smooth surface Γx,t, the following weighted least squares minimization prob-lem is solved for x ∈ [0, 1] and t ∈ [0, 1],

N∑i=1

T∑j=1

(log(mxi,tj)− Γx,t

)2Kh1,h2,ρ(xi − x, tj − t), (3)

where the summation is over all ages and calendar years used to calibrate the model. Theages xi, i = 1, ..., N, and calendar years tj, j = 1, ..., T , are normalized so that they spanthe interval [0,1]. The function Kh1,h2,ρ(·) is the bivariate normal kernel function and itsrole is to serve as the weighting function. The solution to the minimization problem (3)at (x, t) is given by the Nadaraya-Watson estimator (Nadaraya, 1964; Watson, 1964):

Γx,t =

∑Ni=1

∑Tj=1Kh1,h2,ρ

(xi − x, tj − t) log(mx,t)∑Ni=1

∑Tj=1Kh1,h2,ρ

(xi − x, tj − t).

The bivariate normal kernel function is dependent on three bandwidth parameters, h1 > 0,h2 > 0 and ρ ∈ [0, 1), which determine the degree of smoothing in the age, time and cohortdimensions. One of the benefits of capturing cohort effects as a smoothness parameter isthat it allows for the assessment of the strength of cohort effects between countries, see Liet al. (2016).

The performance of any non-parametric regression method depends critically on the choiceof the bandwidths, see e.g, Li and Racine (2007). The bandwidths have to be chosen op-timally in some sense. A common method to choose the optimal bandwidth parametersis (Generalized) Cross Validation. Li et al. (2016) propose choosing the bandwidhts byminimizing the mean square forecast error (MSFE) with a 5 year time horizon. This entailsproducing a mortality forecast from 2000 to 2004 based on data up to 1999. The optimal

13

bandwidth parameters are then found by the solution to:

(h1, h2, ρ) = argminh1,h2,ρ

1

5N

N∑i=1

T∑j=T−4

(log mx,t − Γx,t

)2

.

Once the smoothed surface Γx,t is estimated, one-step ahead forecasts can be obtainedsequentially. The proposed method of forecasting is deterministic in nature, which impliesthat simulated sample paths and prediction intervals that properly reflect the stochas-tic nature of mortality are hard to obtain. A more detailed discussion of the forecastingprocedure can be found in the Appendix.

2.2.5 Characteristics of the Mortality Models

The mortality models described in Sections 2.2.1-2.2.4 differ in a number of ways. Toprovide an overview of these differences, the general characteristics of the various mortalitymodels are summarized in Table 1.

Element LC CBD M7 HU 2D KSForecasting methodology Stochastic Stochastic Stochastic Deterministic

EffectsAge,

Calendar year

Age,calendar year,

cohort

Age,calendar year

Age,calendar year,

cohortNo. of sources driving

mortality improvementsat different ages

1 4 4 1

Assumptions of smoothness None Age AgeAge,

calendar year,cohort

Inclusion of cohort effects No Yes No YesCalibration method MLE MLE Robust PCA Least Squares

Table 1: A summary of the characteristics of the LC, M7, HU and 2D Kernel model

14

2.2.6 Modeling Mortality at Very Old Ages

In order to prevent the underestimation of annuity prices, accurate forecasts of mortalityfor ages beyond 89 are required. Empirical mortality rates exhibit high variability in thisage segment, due to the scarcity of lives at the advanced ages. Therefore, a method thatextrapolates mortality rates to higher ages is required. The standard approach in the lit-erature is to use some parametric law of mortality.

In a comparative analysis by Antonio (2012), the mortality law by Kannisto (1994) ischosen as the best among a set of various extrapolation approaches, at least for the Bel-gian population. The mortality law of Kannisto (1994) models in a particular calendaryear t, the force of mortality as:

µx,t =ceωx

1 + ceωx, (4)

where the parameters c and ω have to be estimated from mortality data in the calendar yeart. The Kannisto (1994) model is a parametric law of mortality of the logistic type, havingthe property that as age x tends to infinity, the force of mortality converges to 1. The agerange used to calibrate the Kannisto model has a slight effect on the extrapolated ratesand an appropriate choice must be made. Following Antonio (2012), the Kannisto model iscalibrated to the age range 75-89 and extrapolated to ages 90-120. The same extrapolationmethod is used for the LC, CBD M7, HU and 2D KS models to remain as consistent aspossible.

The parameters c and ω can be estimated by means of non-linear (weighted) least squaresregression or by maximum likelihood (see e.g., Gavrilova and Gavrilov (2014) and Thatcheret al. (1998)). In the monograph by Doray (2008), it is shown that a reparametrized versionof equation (4) makes it possible to estimate the parameters using ordinary or weightedleast squares. One of the benefits of estimating the parameters by least squares is the easeof implementation. Following Doray (2008), the parameters c and ω are calibrated for eachcalendar year t separately by rewriting equation (4) as the linear model

logit(µx,t) = c+ ωx+ εx,t,

where εx,t is a random error. Assuming homoskedastic error terms, the parameters c andω can be estimated using OLS.

15

2.3 Description of the Data

The data consists of recorded death numbers and central exposures-to-risk for the Eng-land and Wales male population, which are provided by the Human Mortality Database(HMD)1 and are publicly available. The England and Wales male population is chosenbecause a lot of literature on this population is available, which facilitates comparison ofthe results in this thesis with the established literature. The data encompasses the period1861-2014 and age range 0-109. The recorded death numbers and central exposures-to-riskare used to calculate empirical unsmoothed all-cause mortality rates and one-year deathprobabilities by single year of age.

Figure 1 shows the log-transformed empirical mortality rates log mx,t in the calendar years1961, 1990 and 2014. Each curve gives a snapshot of the mortality experience in a partic-ular year. In each year, the mortality experience has the characteristic shape describing atypical mortality curve over age. Mortality rates are comparatively high in the first yearafter birth and decrease rapidly to a minimum at around age 10. Thereafter, mortalityrates increase in an approximately exponential fashion, before decelerating at the end ofthe life span. The excess mortality observable in the so-called accident hump (ages 18-25)is primarily caused by accidents, injuries and suicides (Pitacco et al., 2009). The mortalitycurves show an erratic variation at the highest ages, which is caused by relatively largesampling errors due to small exposure sizes at this age range. While there are markedimprovements in the mortality rates over time for all ages, the strength of improvementdiffers with age. Trends in one-year death probabilities are very similar and are not shownto save space.

Figure 1: Logarithms of observed empirical mortality rates log mx,t for calendar years1961, 1990 and 2014 and age range 0-89.

1http://www.mortality.org, country code GBRTENW, retrieved on September 2, 2017.

16

Only part of the data is used to calibrate the selected mortality models, namely the periodbetween 1961-2004 and the age range 60-89 years old. The age range starts at age 60,because the mortality forecasts are used to price annuities, which are generally applicableto retirees only. A second reason is that the CBD M7 model is applicable only to ages above60. Due to contamination of mortality data by noise at the advanced ages (say x > 89), themortality models are calibrated to the maximum age of 89. The Kannisto (1994) mortalitylaw is used to extrapolate to higher ages. The calibration period 1961-2004 is chosen to bein line with Cairns et al. (2009) and Cairns et al. (2011). This period does not encompassthe Second World War and the influenza pandemic of 1957, rare events that do not revealstructural risks to the annuity provider but have a large impact on forecasts. The last tenyears of data (2005-2014) are not used for model calibration, but are used to assess theout-of-sample forecasting performance of the various mortality models.

2.4 Research Method

The objective of this thesis is to answer the research question:

To what extent does model choice affect the price of annuities and how can the similaritiesand differences be attributed?

The price of an annuity is directly related to summary statistics (i.e., the expected value,90th or 95th percentile) of the random present value of its future cash flows. The randompresent value of an annuity that pays one euro at the end of every calendar year conditionalon the survival of an annuitant aged x in calendar year t is defined as:

ax,t =∑k≥0

{ k∏j=0

px+j,t+j

}vk+1. (5)

The formula for the random present value depends on a set of future survival probabilities{px,t+h} and a discount factor v. The discount factor is equal to the inverse of the expectedinterest rate, which is set to i = 0.02 = 1

vfor convenience. The future survival probabilities

are unknown and need to be forecast. These forecasts are provided by the the LC, CBDM7, HU and 2D KS models.

The mortality models are calibrated on the England and Wales male mortality experienceencompassing the calendar years 1961-2004 and the pensioner ages 60-89. The mortalitymodels are then used to forecast either age-specific log-transformed mortality rates (LC,HU, 2D KS) or logit-transformed one-year death probabilities (CBD M7). Subsequently,the Kannisto (1994) mortality law is used to extrapolate to the advanced ages 90-120 foreach simulated sample path and calendar year separately, so that annuity present valuesare calculated on the full pensioner age range 60-120. Once the forecasts are obtained,they are translated to one-year survival probabilities using the relationships presented inSection 2.1.

17

Applying formula (5) using point forecasts of survival probabilities yields accurate esti-mates of ax,t only if future mortality evolves according to the point forecast with certainty.This is clearly unrealistic, because there is uncertainty about the future evolution of mor-tality. To properly reflect the stochastic nature of mortality, formula (5) needs to beconsidered under many different sample paths that are likely to occur. When the mor-tality model allows it, simulated sample paths can be used to estimate the entire densityfunction of ax,t rather than just a point estimate. To evaluate the implications of modelrisk on annuity prices, three common pricing strategies are considered:

1. Pricing the annuity without any safety loading, i.e., the mean value of ax,t is used.

2. A safety loading is added by setting the price of the annuity to equal the 90thpercentile of ax,t.

3. A safety loading is added by setting the price of the annuity to equal the 95thpercentile of ax,t.

Section 2.2 has made clear that the mortality models differ in a number of ways. Thisimplies that forecasts produced by the various models might differ substantially as a re-sult, which clearly impacts the estimated density function of ax,t and its price. In order toexplain the similarities and differences in annuity prices produced by the various mortalitymodels, a more formal model comparison is needed.

To evaluate the quality of forecasts, many evaluation criteria on both qualitative and quan-titative aspects have been suggested in the literature, see e.g., Cairns et al. (2008, 2009,2011), Plat (2009), Dowd et al. (2010a,b), Haberman and Renshaw (2011) and Cairnset al. (2011). A particular model may outperform alternative models both in terms ofin-sample goodness-of-fit and out-of-sample forecasting performance, yet produce implau-sible forecasts or prediction intervals (Cairns et al., 2011). Therefore, it is important tostudy and compare the mortality models both qualitatively and quantitatively. The cri-teria suggested in the literature provide a framework to understand and explain potentialsimilarities and differences that might arise in annuity prices. Not all of the criteria putforth in the literature contribute to explaining the potential similarities and differencesbetween annuity prices to the same extent. Therefore, only the criteria that contribute themost in answering the research question are considered. The selected criteria are:

• The model should have high goodness-of-fit to the calibration sample.

• Residuals by age, calendar year and year of birth should be pattern-free.

• At least for some countries, the model should incorporate a cohort effect.

• The model should be transparent.

• Point forecasts should be plausible and consistent with historical trends.

• Point forecasts should have good out-of-sample forecasting performance.

• It should be possible to use the model to generate sample paths and calculate pre-diction intervals.

18

• The structure of the model should make it possible to incorporate parameter uncer-tainty in simulations.

• Forecast levels of uncertainty should be plausible and consistent with observed vari-ability in mortality data.

A good mortality model is consistent with historical mortality data and captures all therelevant features of the mortality experience. For this reason, the models are evaluated interms of goodness-of-fit and residual patterns and it will be investigated whether or not amodel should allow for cohort effects. Unrealistic model output might be traced back tothe calibrated model parameters, which is why it is important that a mortality model issufficiently transparent to interpret the calibrated parameters. The produced point fore-casts should display no obviously implausible behaviour (such as a kink, an increasingrather than a decreasing trend, or predetermined convergence to a constant value) and hasto perform well in terms of out-of-sample forecasting performance. To obtain the completedensity function of a life annuity present value, it is necessary that sample paths and ac-curate predictions intervals can be produced. Such prediction intervals should take intoaccount all sources of uncertainty, including parameter uncertainty. A mortality modelthat is not parsimonious might produce prediction intervals that are implausibly wide, dueto the estimation of many parameters in combination with a short time series. It is impor-tant that prediction intervals do not consistently underestimate levels of uncertainty forcertain age groups. Therefore, prediction intervals should be both plausible and consistentwith observed variability.

The method used to evaluate goodness-of-fit and forecasting performance requires somechoices. Performance can be evaluated on the basis of survival probabilities, mortalityrates, life expectancy or annuity prices, depending on the purpose (Dowd et al., 2010a).In this thesis, logit-transformed mortality rates are chosen as the evaluation measure, be-cause three out of four models provide forecasts of mortality rates directly. The logit ratherthan the log transformation is chosen because the observed variability in mortality ratesis larger on the logit scale which suggests that it provides a more vivid picture of relativegoodness-of-fit and forecasting performance between mortality models. As an evaluation ofgoodness-of-fit and out-of-sample forecasting performance, the mean squared error (MSE),mean percentage error (MPE) and mean absolute percentage error (MAPE) of observedversus fitted logit-transformed age-specific mortality rates are calculated.

The selected evaluation criteria are applied to the age range 60-89 as much as possi-ble, because at ages 90-120 it is hard to pinpoint whether a certain observation should beattributed to the mortality model or to the Kannisto (1994) model used to extrapolate tothese ages. The exception is the plausibility of point forecasts. The point forecasts arevisually checked to verify whether the point forecasts do not show any clearly implausiblebehaviour, also at the advanced ages.

19

The steps taken to answer the research question are as follows. First, the mortality modelsare compared on the basis of the selected evaluation criteria. Then, the forecasts are usedto estimate point forecasts and density functions of various annuity present values. Thesimilarities and differences in the mean, median, variance, interquartile range and 90thand 95th percentiles are analyzed with the help of the insights obtained from the modelcomparison. Since the annuity prices are set to the mean, 90th and 95th percentiles, theanalysis sheds light on both the extent that model choice affects annuity prices, and howthe similarities and differences might be attributed.

20

3 Empirical Results

This Chapter presents the results of the empirical study. The results of the model compar-ison are discussed in Section 3.1. Section 3.2 evaluates and compares the density functionsof the annuity present values produced by the various mortality models and the resultingannuity prices.

3.1 Model Comparison

Figure 2 shows the observed versus fitted log-transformed mortality rates log mx,t for theLC, CBD M7, HU and 2D KS models. The left and center panel show the evolution oflog-transformed mortality rates at age 65 and age 85 over the period 1961-2004. The fourmodels produce similar log-transformed age-65 mortality rates, but the center panel showsthat log mortality rates at age 85 are consistently underestimated by the CBD M7 model.The right panel shows observed and fitted log-transformed mortality for the ages 60-89 inthe year 2004. It appears that the mortality rates at the advanced ages are comparativelylow for the CBD M7 model. This can be attributed to the underestimated mortality ratesaround age 85 for this model, which are part of the age range used to calibrate the Kan-nisto (1994) law of mortality.

Figure 2: The observed and fitted log-transformed mortality rates. The left and cen-ter panels show the evolution of age-65 and age-85 mortality over the period 1961-2004.The right panel shows the progression with age of log-transformed mortality rates in thecalendar year 2004.

21

The model should have high goodness-of-fit to the calibration sample.

To evaluate the goodness-of-fit of each mortality model, the MSE, MPE and MAPE cor-responding to the logit-transformed mortality rates log

( mx,t

1−mx,t

)are calculated on the age

range 60-89 and time period 1961-2004. The results are shown in Table 2. All four modelsobtain great in-sample fit, having at most mean absolute percentage error of 1.757%. The2D KS model attains the best goodness-of-fit, whereas the LC method performs relativelyweakest in-sample. It is no surprise that the HU and CBD M7 models outperform the LCmodel in-sample, because these models consist of more parameters and the MSE, MPEand MAPE do not penalize for the number of parameters.

Model MSE MPE MAPE

Period 1961-2004

LC 0.027 -1.073 1.757CBD M7 0.0159 -0.597 0.919HU 0.079 -0.786 1.1582D KS -0.019 -0.369 0.593

Table 2: Goodness-of-fit of the LC, CBD M7, HU and 2D KS models based on the cali-bration period 1961-2004.

Residuals by age, calendar year and year of birth should be pattern-free.

Scatter plots of the scaled deviance residuals by age, calendar year and year of birthfor the LC and CBD M7 models are shown in Figures 3 and 4. For the LC model, the clearripple pattern in the cohort residual plot indicates the inability of this model to captureyear of birth effects. The lack of strong patterns in the residuals by age and calendar yearimplies that the age and calendar year effects are appropriately captured by the LC model.For the CBD M7 model, a mild pattern in the residuals by age can be observed, suggest-ing that not all of the age effects are captured. The CBD M7 model does capture thecalendar year and year of birth effects appropriately. These results are in line with thosepresented in Haberman and Renshaw (2011). Figure 5 shows the standardized residualsproduced by the HU model. Clear patterns are observable by age, calendar year and yearof birth. These patterns in the residual plots indicate that the HU model inadequatelycaptures relevant features of the observed mortality data. The residual plots for the 2DKS model show only subtle patterns by age and calendar year, see Figure 6. The spreadof the residuals by cohort declines with year of birth. Overall, it seems that the 2D KSmodel does reasonably well to capture all relevant features of the observed mortality data.

22

Figure 3: The scaled deviance residuals produced by the LC model. From left to right: byage, calendar year and year of birth.

Figure 4: The scaled deviance residuals produced by the CBD M7 model. From left toright: by age, calendar year and year of birth.

23

Figure 5: The unstandardized residuals produced by the HU model. From left to right:by age, calendar year and year of birth.

Figure 6: The standardized residuals produced by the 2D KS model. From left to right:by age, calendar year and year of birth.

24

At least for some countries, the model should incorporate a cohort effect.

The observed patterns in the residuals by year of birth produced by the LC and HUmodels imply that cohort effects are present in the mortality data, which are not capturedappropriately by these models. It appears that cohort effects are an important feature ofthe England and Wales male population in the period under study. This has been estab-lished in the literature, see e.g., Government Actuary’s Department (1995, 2001, 2002);Continuous Mortality Investigation Bureau (2002); Willets (2004); MacMinn et al. (2005);Richards et al. (2006). In particular, Willets (2004) and Richards et al. (2006) found thatthe largest improvements in mortality rates in England and Wales have been consistentlyexperienced by individuals born between 1925 and 1945 (centered around 1930). Severalexplanations of this cohort effect have been suggested, namely smoking behaviour, dietin early life and prenatal conditions, see e.g., Willets (2004) and Gavrilov and Gavrilova(2004). Thus, for the England and Wales male population, mortality models should incor-porate a stochastic cohort effect to capture all relevant features of the mortality experience.

The model should be transparent.

The calibrated parameters αx, βx and κt for the LC method are shown in Figure 7. It canbe observed that the fitted sequence of αx’s is nearly linear and monotonically increasingwith only slight curvature. Under the identification constraints that are imposed, the se-quence αx reflects the time averages of log mx,t for ages x. The fitted αx profile implies thatthe overall level of mortality is increasing with age, under the LC method. The estimatedcalendar year effect κt decreases quite linearly over time, reflecting the declining trend ofobserved mortality over time. The βx profile indicates that improvements in mortalityrates are declining in age for ages 60-89.

The calibrated calendar year effects for the CBD M7 method are shown in Figure 8.The calendar year effects κ

(1)t , κ

(2)t and κ

(3)t affect logit-transformed death probabilities at

different ages in different ways. The first time index κ(1)t exhibit a downward trend, which

expresses the improvement over death probabilities over time for all ages. The secondand third time indices modulate the improvements in age-specific logit-transformed deathprobabilities over time, according to the quadratic function of age specified in equation(2). The estimated cohort effect γt−x can be analyzed to determine which cohorts experi-enced comparatively high and low death probabilities. For instance, the sequence of γt−x’ssuggests that males born around the year 1910 have experienced consistently higher deathprobabilities than generations born both earlier and later than 1910. The peak observedin γt−x for the oldest cohorts might be attributed to the fact that there are only fewobservations available for these cohorts.

25

Figure 9 shows the calibrated location function θx and the K = 4 basis functions φx,k along

with the respective coefficients βt,k. The fitted location function θx describes the mean agepattern and suggests that the logarithms of mortality rates are monotonically increasingwith age for the age range 60-89. The first fitted basis function φx,1 models changes inmortality over time for the entire age range, but the sensitivity to changes in the calendaryear effect βt,1 is decreasing in age. The second basis function φx,2 primarily models thedifferences in mortality for males between ages 60-74 and those aged 75-89, which can beseen from the fact that φx,2 < 0 for ages below 75 and φx,2 > 0 for older ages. Differencesbetween ages 65-80 and those younger or older than this age segment are captured by thethird basis function φx,3. The φx,4 is harder to interpret, but the peaks and troughs indicate

which ages are most sensitive to changes in βt,4. The calibrated calendar year effects βt,1suggest that mortality rates have declined for all ages. The remaining βt,k sequences showan erratic evolution over time, making it hard to interpret the differences in mortality atvarious ages over time.

For the 2D KS model, the only parameters available are the optimal bandwidths, h1 =0.014, h2 = 0.0132 and ρ = 0.554. These parameters do not lend themselves well to inter-pretation of age or calendar year effects. Given the relative size of the calibrated cohortbandwidth ρ = 0.554 compared to the values reported in Table I of Li et al. (2016) forvarious countries, it can be concluded that there is a comparatively strong cohort effectpresent in the observed mortality data.

Figure 7: Calibrated parameters αx, βx and κ for the LC model.

26

Figure 8: Calibrated parameters κ(1)t , κ

(2)t , κ

(3)t and γt−x for the CBD M7 model.

Figure 9: Estimated parameters θx, φ1, φ2, φ3, φ4 and βt,1, βt,2, βt,3, βt,4 for the HU model.

27

Point forecasts should be plausible and consistent with historical trends.

A mortality model may produce point forecasts that are clearly unrealistic. The unre-alistic features of a forecast may only become visible for forecasting horizons exceedingthe time period that is used in a quantitative back test. Therefore, it is important toevaluate the point forecasts visually, to determine whether the forecasts are plausible andare consistent with historical trends. Figure 10 shows the observed mortality rates mx,t

and the point forecasts at ages 60, 75, 85 and 95 produced by the various models. The firstobservation that can be made is that point forecasts produced by all models are consistentwith historical trends. The point forecasts adequately follow the differing rate of decline atdifferent ages. The models produce forecasts of mortality rates that decline faster at age85 than at age 65, implying that these mortality rates tend to converge in future years.

The point forecasts produced by the LC and HU models are reasonably smooth and showno sudden changes or kinks and therefore pass the plausibility criterion. For the CBD M7model, some ‘wobbly’ behavior in the point forecasts at all ages can be observed. Thepoint forecasts are linked to the estimated cohort effect and the wobbly behavior occursin regions where the mortality rate is still influenced by the estimated cohort effect. Fromthe period that the mortality rates are only dependent upon the (smooth) forecasts of thecohort effect, the progression no longer shows the wobbly behavior. Cairns et al. (2011)observed similar features of the point forecasts and concluded that these can still be consid-ered plausible. However, the age-95 mortality rates produced by the CBD M7 model showthat a very pronounced kink can be observed around 2030. This feature of the forecast isclearly implausible. A robustness check showed that the kink disappears when calibratingthe CBD M7 model on the time period 1981-2004, see the Appendix.

The 2D KS model produces point forecasts which at first glance seem plausible at ages 75,85 and 95. However, at age 60, the point forecast is constant, implying that the mortalityrate will not decline further. This idiosyncrasy can be attributed to the fact that the 2DKS model is a kernel smoothing method, which produces estimated mortality rates by ‘bor-rowing’ information from mortality rates at neighbouring ages and calendar years. Thisimplies that the model enforces a lower limit on mortality rates which equals the lowestobserved mortality rate in the calibration sample. I consider this an implausible featureof the point forecasts, because it is clearly unrealistic that there exists a lower limit thatis determined in this way.

28

Figure 10: point forecasts of mx,t produced by the various models at ages x = 60, x = 75and x = 85 and x = 95. Note that the vertical axis is plotted on a log scale. Crudemortality rates mx,t for 1961-2004 are shown as black dots.

The model should have good out-of-sample forecasting performance.

The models are calibrated to the ages 60-89 and calibration period 1961-2004 and arethen used to produce forecasts for the periods 2005-2009 and 2005-2014. Figure 11 showsthe forecast logit-transformed mortality rates log

( mx,t

1−mx,t

)for ages 65, 75 and 85 produced

by the LC, CBD M7, HU and 2D KS models. The logit-transformed observed mortalityrates are also shown. It seems that there is a tendency for the models to overestimatemortality rates, which is least pronounced for the 2D KS model.

To quantify the forecasting performance, back tests on forecast logit-transformed mor-tality rates are employed using the MSE, MPE and MAPE over the age range 60-89 asperformance metrics. The results are shown in Table 3. For both 2005-2009 and 2005-2014,the 2D KS model performs best, whereas the LC method performs worst in both cases.The forecasting performance for the CBD M7 and HU methods is comparable.

29

Figure 11: Point forecasts of log mx,t and 95% prediction intervals produced by the LCmodel at ages x = 65, x = 75 and x = 85. Empirical log mortality rates log mx,t for1961-2004 are shown as black dots.

Model MSE MPE MAPE

Period 2005-2009

LC 1.877 -2.347 2.856CBD M7 1.484 -1.606 2.311HU 1.826 -2.085 2.5772D KS -0.161 -1.039 1.593

Period 2005-2014

LC 3.039 -3.385 4.149CBD M7 2.221 -2.297 3.417HU 1.914 -2.442 2.9692D KS -0.365 -1.257 1.734

Table 3: Out-of-sample performance of the LC, CBD M7, HU and 2D KS models, forforecast periods 2005-2009 and 2005-2014.

30

It should be possible to use the model to generate sample paths and calculateprediction intervals.

The LC, CBD M7 and HU models use stochastic time series processes to model the evo-lution of mortality over time. These models allow for the generation of sample paths bysimulating forecasts using the fitted time-series. Quantiles of these simulations can beused to obtain prediction intervals that properly reflect the stochastic nature of mortalityover time. The 2D KS model uses a deterministic forecasting method which neglects therandom nature of mortality. This implies that sample paths cannot be generated and thatprediction intervals that are clearly too narrow.

The structure of the model should make it possible to incorporate parame-ter uncertainty in simulations.

For all four models it is possible to incorporate parameter uncertainty. For the LC, CBDM7 and 2D KS models, bootstrapping methods are employed. Parameter uncertainty forthe HU model is incorporated by assuming normally distributed sources of error, whichcan be used to generate sample paths and calculate prediction intervals analytically. Thedetails of the exact method used to incorporate parameter uncertainty are given in theAppendix.

The model should be relatively parsimonious.

The number of effective parameters for each model is shown in Table 4. The effectivenumber of parameters takes account of the constraints on parameters for the LC and CBDM7 models. For the LC, CBD M7 and 2D KS models, the number of effective parametersis fixed for a particular calibration sample. The number of effective parameters used in theHU model depends on the choice of K, the number of basis functions. Following Hyndmanand Booth (2008), K is chosen to be sufficiently large, namely K = 4.

LC CBD M7 HU 2D KSEffective of parameters 102 202 210 3

Table 4: The number of effective parameters for each model.

31

Forecast levels of uncertainty should be plausible and consistent with observedvariability in mortality data.

Cairns et al. (2011) calculated empirical volatilities for historical mortality rates for theEngland and Wales male population. These authors defined the volatility νx at age xby the empirical standard deviation of δ1962,x, ..., δ2004,x, with δt,x = log mx,t − log mx,t−1.Historical volatilities thus obtained are shown in Figure 12. For the England and Walesmale population, the historical volatilities vx are generally increasing with age.

Figure 12: Historical empirical volatilities of mortality rates, as published in Appendix Bof Cairns et al. (2011).

32

Figure 13 shows point forecasts and fan plots of mortality rates mx,t produced by thevarious models. Each fan chart depicts the 50%, 80% and 95% prediction intervals formortality rates produced by the various mortality models. They are plotted on a logarith-mic scale for ages x = 65, x = 75 and x = 85 so that the forecast levels of uncertainty canbe compared with historical variability, as shown in Figure 12.

It can be observed that the LC model produces prediction intervals that are wider atage 65 than at age 85. This is inconsistent with the greater observed volatility in age-85mortality rates between 1961 and 2004. This observation can be explained by the fact thatthe LC model has only one period index κt, which implies that the prediction intervalsare proportional to the estimated age effects βx. Cairns et al. (2011) deemed that suchobservations make the prediction intervals implausible. In contrast, the CBD M7 modeldoes show prediction intervals that become progressively wider with age, reflecting theobserved volatility.

Looking at the prediction intervals for the HU model several observations can be made.The prediction intervals are increasing with age, but expand very rapidly and without limit.Cairns et al. (2011) deemed similarly rapid expansions of prediction intervals produced byother models as implausible. The fact that there is so much uncertainty associated withforecast mortality rates produced by the HU model can be attributed to two factors. Thefirst is that the model is relatively non-parsimonious; it requires four time series mod-els (see Table 1). The second factor is that the time series models are selected using anautomatic selection procedure, which lead to high order ARIMA models that have to beestimated using a short time series.

Li et al. (2016) do not discuss prediction intervals for the 2D KS model. We have at-tempted to produce prediction intervals, but have not been able to find a proper way toaccount for the uncertainty associated with future mortality rates. The reason is that the2D KS model produces deterministic forecasts, rather than forecast based on a stochas-tic calendar year effect. Therefore, no fan plots are shown for the 2D KS model and weconsider the forecast levels of uncertainty to be implausible.

33

Figure 13: Point forecasts of mx,t and fan plots depicting the uncertainty associated with the forecast, produced by the variousmodels at ages x = 65 (green), x = 75 (red) and x = 85 (pink). Note that the vertical axis is plotted on a log scale. Observedmortality rates mx,t for 1961-2004 are shown as black dots.

34

Summary of Model Comparison

Table 5 summarizes the models on the basis of the selected criteria discussed in thissection. Each of the models fails to meet at least one important criteria. This makesit difficult to appoint one mortality model as the most desirable. The table also suggeststhat it is fruitful to combine the insights from multiple mortality models, because themodels compensate for each others’ weaknesses.

Criterion LC CBD M7 HU 2D KSRanking (Goodness-of-fit) 4 2 3 1Inclusion of cohort effects - + - +Transparency + + +/- -Pattern-free residuals - + - +Sample paths and prediction intervals + + + -Parameter uncertainty + + + +Plausible point forecasts + - + +/-Ranking (Forecasting performance) 4 2 2 1Plausible prediction intervals - + - -

Table 5: An overview of the qualitative comparison of the LC, M7, HU and 2D Kernelmodel

35

3.2 Annuity Pricing

The purpose of Section 3.1 was to compare the output produced by the mortality mod-els using a set of criteria that will help attribute similarities and differences in annuityprices. This Section presents the results related to the annuity present values and prices.Figure 14 shows estimated present values of a65,t and a85,t over the time period 1961-2030.The estimated annuity present values are calculated using estimates (1961-2004) and pointforecasts (2005-2100) of the age-specific survival probabilities px,t. The estimated annuitypresent values appear to be diverging over time, but are quite consistent in the beginningof the time period shown. In the years where annuity present values are consistent betweenmodels, they are primarily based on fitted, rather than forecast, survival probabilities. Theanalysis in Section 3.1 indicated that all mortality models produced high goodness-of-fitto the data and that the fitted mortality rates (one-year death probabilities) were quitesimilar, which explains the consistency between the estimated annuity present values inthose years. The estimated annuity present values start to diverge when the present val-ues become more dependent on forecast survival probabilities. The estimates of a65,t arediverging sooner than those of a85,t. This is because annuity present values are based onsurvival probabilities up to age 120 and thus start to depend on forecasts sooner at age 65than at age 85.

It can be observed that the slope of the point forecasts of a65,t and a85,t varies over time,reflecting the varying rates of mortality improvements produced by the different modelsat different ages and over time. The 2D KS model consistently produces the highest pointforecasts of a65,t and the LC method the lowest. For a85,t the LC method also producesthe lowest point forecasts consistently, but the highest point forecast is now produced bythe HU method. These observations can be attributed to the fact that point forecastsproduced by the LC model were comparatively high, whereas those of the 2D KS and HUmethod were relatively low. This implies that survival probabilities produced by the LCmodel are comparatively low, leading to lower annuity present values, and vice versa forthe HU and 2D KS models.

36

Figure 14: Estimated densities a65,t and a85,t over time, produced by the LC (black), CBDM7 (red), HU (blue) and 2D KS (green) models.

The point forecasts of a65,t and a85,t produced by the LC and HU models progress rea-sonably smoothly over time. The point forecasts produced by the CBD M7 and 2D KSmodels show an evolution over time that is more erratic. The CBD M7 and 2D KS modelsincorporate cohort effects and the LC and HU models do not. This is an indication thatthe erratic evolution over time of a65,t and a85,t might be attributed to the inclusion ofcohort effects. The ‘wobbly’ behavior exhibited in the point forecasts of the CBD M7model provides substantiating evidence that the erratic progression in the point forecastsof a65,t and a85,t is indeed caused by the inclusion of cohort effects.

The analysis based on point estimates of annuity present values ax,t thus far has indicatedthat annuity present values between the various mortality models only show marked differ-ences when they are primarily dependent on forecast survival probabilities. The analysis inSection 3.1 has shown that the LC, CBD M7 and HU models are able to produce simulatedsample paths of future mortality rates (one-year death probabilities). It was found thatthe 2D KS model is not suitable for producing simulated future mortality rates. Whensimulated sample paths are available, they can be can be used to obtain estimates of theentire density function of ax,t rather than just point estimates. This allows for the assess-ment of the uncertainty associated with present values. Figure 15 shows the estimateddensities of the annuity present values a65,2005, a85,2005, a65,2030 and a85,2030 produced by thevarious models. A selection of summary statistics corresponding to the density functionsare shown in Tables 6 and 7.

37

Figure 15: Estimated densities a65,2005, a65,2040, a85,2005 and a85,2030 produced by the LC(black), CBD M7 (red), HU (green) and 2D KS (blue) models.

38

a65,2005 LC CBD M7 HU 2D KernelPoint forecast 14.05 14.461 14.565 14.979Mean 14.046 14.489 14.857 -Median 14.046 14.458 14.854 -Variance 0.104 0.336 0.084 -Interquartile range 0.443 0.741 0.394 -90th percentile 14.465 15.242 15.230 -95th percentile 14.574 15.512 15.338 -

a85,2005 LC CBD M7 HU 2D Kernel

Point forecast 4.503 4.277 4.508 4.826Mean 4.504 4.290 4.521 -Median 4.503 4.272 4.522 -Variance 0.005 0.0740 0.007 -Interquartile range 0.098 0.359 0.117 -90th percentile 4.593 4.644 4.631 -95th percentile 4.62 4.767 4.663 -

Table 6: Summary statistics for a65,2005 and a85,2005 for the LC, CBD M7, HU and 2D KSmodels.

39

a65,2030 LC CBD M7 HU 2D KernelPoint forecast 15.847 16.371 17.575 17.213Mean 15.829 16.414 18.32 -Median 15.85 16.355 18.314 -Variance 0.382 1.220 1.258 -Interquartile range 0.825 1.461 1.532 -90th percentile 16.605 17.836 19.76 -95th percentile 16.826 18.300 20.168 -

a85,2030 LC CBD M7 HU 2D KernelPoint forecast 5.209 5.505 5.856 6.698Mean 5.209 5.582 6.098 -Median 5.209 5.498 6.08 -Variance 0.059 0.999 0.330 -Interquartile range 0.320 1.305 0.770 -90th percentile 5.517 6.853 6.835 -95th percentile 5.607 7.291 7.076 -

Table 7: Summary statistics for a65,2030 and a85,2030 for the LC, CBD M7, HU and 2D KSmodels.

40

Some general observations can be made that hold for all models. The mean value of a65,2005is consistently higher than that of a85,2005. This reflects the fact that liabilities are expectedto be higher for an annuity sold to an annuitant aged 65 than for an annuitant aged 85,because the former is expected to live longer. A second general observation is that themean of the estimated densities is higher in the year 2030 than in 2005, for both ages 65and 85. This reflects the improvement in longevity over time that is forecast by all ofthe models. Furthermore, the estimated variance and interquartile range are increasingover time. This reflects the increasing uncertainty associated with the mortality forecastsproduced by the various mortality models, which is propagated to the annuity present val-ues. The estimated densities are symmetric, which can be attributed to the distributionalassumptions on the error structures for each of the models.

The comparison of the various mortality models in Section 3.1 sheds light on the dif-ferences in forecast mortality rates and associated prediction intervals produced by themodels. These differences are reflected in the density functions of the annuity presentvalues. The mean and median values of the annuity present values show considerable dif-ferences between models, which can be attributed to the differences in point forecasts ofmortality rates produced by the various models. The variance, interquartile range, 90thand 95th percentiles also show marked differences between models. This is the result ofthe considerable differences in the prediction intervals produced by the various mortalitymodels. In Section 3.1 it became clear that the differences in point forecasts and predic-tion intervals between mortality models become increasingly prominent as the forecastinghorizon grows. This is reflected in the estimated density functions of the annuity presentvalues, since the differences tend to be larger for ax,2030 than for ax,2005.

The differences in the summary statistics can be attributed using the findings presentedin Section 3.1. The analysis indicated that the CBD M7 and 2D KS models are able tocapture all relevant features of the mortality data, whereas the LC and HU models arenot. In particular, the LC and HU models do not incorporate a cohort effect. Looking atTables 6 and 7, the LC and HU models do not consistently produce higher or lower meanannuity values than the CBD M7 and 2D KS models. This indicates that there seemsto be no systematic underestimation or overestimation of mean annuity present values bymodels that do not include a cohort effect. The back test on logit-transformed mortalityrates performed in Section 3.1 indicated that all the mortality models tend to overestimatemortality rates. When mortality rates are overestimated, this implies that annuity presentvalues are lower than they should be. The 2D KS model yields comparatively high meanannuity present values. This can be attributed to the fact that the 2D KS model seems tooverestimate mortality rates to a lesser extent than the other models.

41

Furthermore, it was found that the LC model underestimates volatility at high ages. Thisis reflected in the comparatively low variances, interquartile ranges and 90th and 95thpercentiles associated with the annuity present values. The HU model produced rapidlyexpanding prediction intervals, which explains why annuity present values that depend pri-marily on long-term forecasts (e.g., a65,2030) show particularly high values for the variances,interquartile ranges and 90th and 95th percentiles. The prediction intervals produced bythe HU model expand at such a rapid pace without limit. This implies that it can beexpected that the measures of uncertainty produced by the HU model are expected to beeven larger relative to the LC and CBD M7 models for annuities sold further in the futurethan the year 2030.

The observed differences in the estimated mean, 90th and 95th percentiles produced bythe various mortality models yield considerable differences in annuity prices. The annuityprice based on the estimated mean present value of a65,2005 between the lowest (LC model)and the highest price (2D KS model) is 6.6%. For a85,2005 and a65,2030 and a85,2030 thelargest relative differences are 14.1%, 8.6% and 28.6%, respectively. These differences areeconomically significant: if the annuity a85,2030 is priced based on the mean present value,the price based on the LC method is almost 30% lower than the price based on the 2D KSmethod. Economically significant differences are also found when setting prices equal tothe estimated 90th and 95th percentiles. Based on the 90th percentile, the largest relativedifferences between prices for a65,2005, a85,2005, a65,2030 and a85,2030 are 5.4%, 3.2%, 19.0%and 24.2%, respectively. Based on the 95th percentile, the relative differences betweenprices are 6.4%, 3.2%, 19.8% and 30.0%, respectively.

42

4 Conclusion

The objective of this thesis was to evaluate the extent to which model choice affects annuityprices and how the similarities and differences between annuity prices can be attributed. Itturns out that the LC, CBD M7, HU and 2D KS models produce economically significantdifferences in annuity prices. Prices differ by as much as 28.6%, 24.2% and 30.0% based onthe mean, 90th percentile and 95th percentile of the annuity present value, respectively. Insome cases annuity prices are quite similar, but overall more differences than similaritiescan be found.

A comparative analysis of the LC, CBD M7, HU and 2D KS models indicated that noneof the models satisfies all the desirable criteria for a mortality model (i.e., high goodness-of-fit, pattern-free residuals, incorporation of all relevant features of the mortality data,transparency, plausible point forecasts and prediction intervals, and good out-of-sampleperformance). Therefore, it cannot be concluded that one mortality model performs bet-ter than the others in an absolute sense. This implies that calculating annuity prices usingmultiple mortality models is fruitful, because the insights derived from the various modelscan be combined and the different models compensate for each others’ undesirable prop-erties.

The differences in annuity prices are attributable to some extent. Differences in annu-ity prices based on the mean present value can be explained by mortality forecasts thatare substantially different between mortality models. The 2D KS method produces consis-tently highest prices based on the mean present value, which is due to the comparativelystrong mortality improvements that it forecasts. Differences in annuity prices based on the90th and 95th percentiles can be attributed to differences in the prediction intervals pro-duced by the various mortality models. The LC model underestimates mortality at highages, thus producing annuity prices that are comparatively low. The HU model producesprediction intervals that increase rapidly in width, explaining the high prices of annuityvalues that are dependent on long-term mortality forecasts. As annuity prices become moredependent on long-term forecasts (i.e., ax,2030 compared to ax,2005), differences in annuityprices get larger. This can be attributed to the fact that point forecasts and prediction in-tervals produced by the various models have been found to be diverging as the forecastinghorizon increases.

All four models exhibit implausible features either in point forecasts (CBD M7 and 2D KS)or prediction intervals (LC, HU, 2D KS). This diminishes the reliability of the forecastsproduced by all mortality models. Furthermore, there is a general tendency of the mortal-ity models to overestimate future mortality rates, despite having high goodness-of-fit andgood out-of-sample forecasting performance. Annuity providers are recommended to usemore conservative annuity prices, based on the 90th and 95th percentile rather than themean present value to avoid underestimating future liabilities. Since the 2D KS model isunsuitable to calculate such prices, it is rejected despite having the best goodness-of-fitand out-of-sample forecasting performance.

43

Limitations and Future Research

The mortality models have been calibrated to the same age range and time period. Mortal-ity forecasts and annuity prices can be sensitive to the time period and age range that areused to calibrate the mortality models. The extent to which forecasts and annuity pricesare robust to changes in the calibration sample can be investigated through a robustnessanalysis. In this thesis, no robustness analysis was applied because of time constraints.Ideally, annuity prices should be computed using various calibration samples. A modelthat produces annuity prices that are more robust to the calibration sample might befavored over mortality models that do not. One of the limitations of this thesis is thatsuch a robustness analysis was not considered. It is also possible to determine the optimalcalibration period for each model separately, see for instance Pitacco et al. (2009).

The data is not sufficiently rich to establish which models prices annuities most ‘cor-rectly’. Ideally, data from an annuity provider is used, because it allows back testing ofannuity present values. On the basis of such an analysis, a particular mortality modelmight be more confidently chosen as the winner, in the sense that it gave the best ex-postassessment of the risks held by the annuity provider. Another limitation of the data thatis used, is that the general population is considered. Studies have shown the evidence ofselection effects in the annuities market, see e.g., Finkelstein and Poterba (2002, 2004).In particular, annuitants are longer-lived than non-annuitants. Furthermore, other fac-tors such as socio-economic status and lifestyle have been associated with differentials inmortality, see e.g., Balia and Jones (2008). The population that is used to calibrate themortality models should reflect the portfolio of the annuity provider as much as possible,to prevent underestimation of the firm’s risk exposure. One extension of this thesis is tocompare annuity prices based on a population of insureds, which more accurately matchesa typical risk portfolio of an annuity provider. Such a data set could be provided by anannuity provider, but unfortunately was not available to me.

In this thesis, the male population of England and Wales is considered to calibrate themortality models. The models have also been applied to other developed economies thathave reliable mortality data available (i.e., the United States, Japan, Australia, the Nether-lands, France and Spain) and female mortality. Since these populations may exhibit dif-ferent mortality features than the England and Wales male population, the conclusions ofthis thesis cannot necessarily be generalized to other countries. A possible direction forfuture research is to assess the extent to which the findings in this thesis hold for otherpopulations.

The mortality models discussed in this thesis are generally used to forecast mortalityfor a single population. When forecasting combined populations (i.e., males and females,different states within a country or different countries), additional challenges arise. There-fore, a suggestion for future research is to extend this thesis by applying its methodologyto mortality models that are designed for combined populations, such as Li and Lee (2005);Cairns et al. (2011); Hyndman et al. (2013); Antonio et al. (2015).

44

In addition to longevity risk, annuity providers also face a variety of other important risks,such as interest rate risk. This risk is significant, due to the long maturities of the liabilities.When interest rates are lower than anticipated, firms might not be able to earn enough ofa return on their reserves to ensure their financial health. Throughout this thesis, a fixeddiscount rate is used to price annuities. One way to expand on this thesis is to incorporatea stochastic model for interest rates and use it to calculate the present value of an annuity.Some useful resources on this topic are Panjer and Bellhouse (1980) and Beekman andFuelling (1990, 1992, 1993).

45

5 References

Antonio, Katrien (2012). Sluiten van een periodetafel GBM/V 2005-2010, extern rap-port als bijlage bij Prognosetafel AG 2012-2062. https://www.ag-ai.nl/download/

26411-Sluiten_van_de_periodetafel_GBMV_2005-2010.pdf.

Antonio, Katrien, Anastasios Bardoutsos, and Wilbert Ouburg (2015). Bayesian Pois-son log-bilinear models for mortality projections with multiple populations. EuropeanActuarial Journal 5 (2), 245–281.

Balia, Silvia and Andrew M. Jones (2008). Mortality, lifestyle and socio-economic status.Journal of Health Economics 27 (1), 1–26.

Barrieu, Pauline, Harry Bensusan, Nicole El Karoui, Caroline Hillairet, Stephane Loisel,Claudia Ravanelli, and Yahia Salhi (2012). Understanding, modelling and managinglongevity risk: key issues and main challenges. Scandinavian Actuarial Journal 2012 (3),203–231.

Beekman, John A. and Clinton P. Fuelling (1990). Interest and mortality randomness insome annuities. Insurance: Mathematics and Economics 9 (2), 185–196.

Beekman, John A. and Clinton P. Fuelling (1992). Extra randomness in certain annuitymodels. Insurance: Mathematics and Economics 10 (4), 275–287.

Beekman, John A. and Clinton P. Fuelling (1993). One approach to dual randomness inlife insurance. Scandinavian Actuarial Journal 1993 (2), 173–182.

de Beer, Joop, Anastasios Bardoutsos, and Fanny Janssen (2017). Maximum human lifes-pan may increase to 125 years. Nature 546 (7660), E16–E17.

de Beer, Joop and Fanny Janssen (2016). A new parametric model to assess delay andcompression of mortality. Population Health Metrics 14 (1), 46.

Booth, Heather, Rob J. Hyndman, Leonie Tickle, and Piet De Jong (2006). Lee-Cartermortality forecasting: a multi-country comparison of variants and extensions. Demo-graphic Research 15, 289–310.

Brouhns, Natacha, Michel Denuit, and Ingrid Van Keilegom (2005). Bootstrapping thePoisson log-bilinear model for mortality forecasting. Scandinavian Actuarial Jour-nal 2005 (3), 212–224.

Brouhns, Natacha, Michel Denuit, and Jeroen K. Vermunt (2002a). Measuring thelongevity risk in mortality projections. Bulletin of the Swiss Association of Actuaries 2,105–130.

Brouhns, Natacha, Michel Denuit, and Jeroen K. Vermunt (2002b). A Poisson log-bilinearregression approach to the construction of projected lifetables. Insurance: Mathematicsand Economics 31 (3), 373–393.

46

Cairns, Andrew J.G., David Blake, and Kevin Dowd (2006). A two-factor model forstochastic mortality with parameter uncertainty: theory and calibration. Journal ofRisk and Insurance 73 (4), 687–718.

Cairns, Andrew J.G., David Blake, and Kevin Dowd (2008). Modelling and managementof mortality risk: a review. Scandinavian Actuarial Journal 2008 (2-3), 79–113.

Cairns, Andrew J.G., David Blake, Kevin Dowd, Guy D. Coughlan, David Epstein, andMarwa Khalaf-Allah (2011). Mortality density forecasts: An analysis of six stochasticmortality models. Insurance: Mathematics and Economics 48 (3), 355–367.

Cairns, Andrew J.G., David Blake, Kevin Dowd, Guy D. Coughlan, David Epstein, AlenOng, and Igor Balevich (2009). A quantitative comparison of stochastic mortality modelsusing data from England and Wales and the United States. North American ActuarialJournal 13 (1), 1–35.

Cairns, Andrew J.G., David Blake, Kevin Dowd, Guy D. Coughlan, and Marwa Khalaf-Allah (2011). Bayesian stochastic mortality modelling for two populations. ASTINBulletin: The Journal of the IAA 41 (1), 29–59.

Chen, Chung and Lon-Mu Liu (1993). Joint estimation of model parameters and outliereffects in time series. Journal of the American Statistical Association 88 (421), 284–297.

Continuous Mortality Investigation Bureau, (2002). An interim basis for adjusting the“92” series mortality projections for cohort effects. Technical report, The Faculty ofActuaries and Institute of Actuaries.

Continuous Mortality Investigation Bureau, (2005). Projecting future mortality: Towardsa proposal for a stochastic methodology,. Technical report, Working paper 15.

Continuous Mortality Investigation Bureau, (2006). Stochastic projection methodolo-gies: Further progress and p-spline model features, example results and implications,.Technical report, Working paper 20.

Currie, Iain (2006). Smoothing and forecasting mortality rates with p-splines. Talk givenat the Institute of Actuaries .

Currie, Iain (2016). On fitting generalized linear and non-linear models of mortality.Scandinavian Actuarial Journal 2016 (4), 356–383.

Currie, Iain, Maria Durban, and Paul H.C. Eilers (2004). Smoothing and forecastingmortality rates. Statistical Modelling 4 (4), 279–298.

Doray, Louis G. (2008). Inference for logistic-type models for the force of mortality. Livingto 100 and beyond, SOA Monograph M-L108 1, 18.

Dowd, Kevin, Andrew J.G. Cairns, David Blake, Guy D. Coughlan, David Epstein, andMarwa Khalaf-Allah (2010a). Backtesting stochastic mortality models: an ex post eval-uation of multiperiod-ahead density forecasts. North American Actuarial Journal 14 (3),281–298.

47

Dowd, Kevin, Andrew J.G. Cairns, David Blake, Guy D. Coughlan, David Epstein, andMarwa Khalaf-Allah (2010b). Evaluating the goodness of fit of stochastic mortalitymodels. Insurance: Mathematics and Economics 47 (3), 255–265.

Finkelstein, Amy and James Poterba (2002). Selection effects in the United Kingdomindividual annuities market. The Economic Journal 112 (476), 28–50.

Finkelstein, Amy and James Poterba (2004). Adverse selection in insurance markets: Pol-icyholder evidence from the UK annuity market. Journal of Political Economy 112 (1),183–208.

Gavrilov, Leonid A. and Natalia S. Gavrilova (2004). Early-life programming of aging andlongevity: the idea of high initial damage load (the HIDL hypothesis). Annals of theNew York Academy of Sciences 1019 (1), 496–501.

Gavrilova, Natalia S. and Leonid A. Gavrilov (2014). Mortality trajectories at extreme oldages: a comparative study of different data sources on US old-age mortality. Living to100 monograph 2014.

Gompertz, Benjamin (1825). On the nature of the function expressive of the law of humanmortality, and on a new mode of determining the value of life contingencies. PhilosophicalTransactions of the Royal Society of London 115, 513–583.

Government Actuary’s Department, (1995). National population projections 1992-based.Technical report, HMSO, London.

Government Actuary’s Department, (2001). National population projections: review ofmethodology for projecting mortality. Technical report, Government Actuary’s Depart-ment, London.

Government Actuary’s Department, (2002). National population projections 2000-based.Technical report, Government Actuary’s Department, London.

Haberman, Steven and Arthur Renshaw (2011). A comparative study of parametric mor-tality projection models. Insurance: Mathematics and Economics 48 (1), 35–55.

Hall, Peter (1993). On edgeworth expansion and bootstrap confidence bands in nonpara-metric curve estimation. Journal of the Royal Statistical Society. Series B (Methodolog-ical), 291–304.

Hardle, Wolfgang and Adrian W. Bowman (1988). Bootstrapping in nonparametric regres-sion: local adaptive smoothing and confidence bands. Journal of the American StatisticalAssociation 83 (401), 102–110.

Heligman, Larry and John H. Pollard (1980). The age pattern of mortality. Journal of theInstitute of Actuaries 107, 49–80.

Hyndman, Rob J. and Heather Booth (2008). Stochastic population forecasts using func-tional data models for mortality, fertility and migration. International Journal of Fore-casting 24 (3), 323–342.

48

Hyndman, Rob J., Heather Booth, Leonie Tickle, and John Maindonald (2017).demography: Forecasting mortality, fertility, migration and population data. R pack-age.

Hyndman, Rob J., Heather Booth, and Farah Yasmeen (2013). Coherent mortality forecast-ing: the product-ratio method with functional time series models. Demography 50 (1),261–283.

Hyndman, Rob J. and Yeasmin Khandakar (2008). Automatic time series for forecasting:the forecast package for R. Journal of Statistical Software 27 (3), 1–22.

Hyndman, Rob J., Mitchell O’Hara-Wild, Christoph Bergmeir, Slava Razbash, and EaroWang (2017). forecast: Forecasting functions for time series. R package. http://pkg.robjhyndman.com/forecast.

Hyndman, Rob J. and M.D. Shahid Ullah (2007). Robust forecasting of mortality andfertility rates: a functional data approach. Computational Statistics & Data Analy-sis 51 (10), 4942–4956.

International Monetary Fund, (2012). Global Financial Stability Report: The Quest forLasting Stability. Washington, D.C.: International Monetary Fund.

Kannisto, Vaino (1994). Development of oldest-old mortality 1950-1990: evidence from 28developed countries. Odense, Denmark: Odense University Press.

Lee, Ronald D. and Lawrence R. Carter (1992). Modeling and forecasting US mortality.Journal of the American Statistical Association 87 (419), 659–671.

Li, Han, Colin O’Hare, and Farshid Vahid (2016). Two-dimensional kernel smoothingof mortality surface: An evaluation of cohort strength. Journal of Forecasting 35 (6),553–563.

Li, Nan and Ronald Lee (2005). Coherent mortality forecasts for a group of populations:An extension of the Lee-Carter method. Demography 42 (3), 575–594.

Li, Qi and Jeffrey S. Racine (2007). Nonparametric Econometrics: Theory and Practice.Princeton, US: Princeton University Press.

MacMinn, Richard, Krzysztof Ostaszewski, Ranee Thiagarajah, and Frederik Weber(2005). An investigation of select birth cohorts. Living to 100 and Beyond .

Makeham, William M. (1860). On the law of mortality and construction of annuity tables.Journal of the Institute of Actuaries 8 (6), 301–310.

McDonald, A.S., Andrew J.C. Cairns, P.L. Gwilt, and K.A. Miller (1998). An internationalcomparison of recent trends in mortality. British Actuarial Journal 4, 3–141.

Nadaraya, Elizbar A. (1964). On estimating regression. Theory of Probability & Its Appli-cations 9 (1), 141–142.

49

Neumann, Michael H. and Jorg Polzehl (1998). Simultaneous bootstrap confidence bandsin nonparametric regression. Journal of Nonparametric Statistics 9 (4), 307–333.

Oeppen, Jim and James W Vaupel (2002). Broken limits to life expectancy. Sci-ence 296 (5570), 1029–1031.

Panjer, Harry H. and David R. Bellhouse (1980). Stochastic modelling of interest rateswith applications to life contingencies. Journal of Risk and Insurance, 91–110.

Pitacco, Ermanno, Michel Denuit, and Steven Haberman (2009). Modelling longevitydynamics for pensions and annuity business. Oxford, England: Oxford University Press.

Plat, Richard (2009). On stochastic mortality modeling. Insurance: Mathematics andEconomics 45 (3), 393–404.

Politis, Dimitris N. (2014). Bootstrap confidence intervals in nonparametric regressionwithout an additive model. In Topics in Nonparametric Statistics, pp. 271–282. Springer.

Renshaw, Arthur and Steven Haberman (2003). Lee-Carter mortality forecasting withage-specific enhancement. Insurance: Mathematics and Economics 33 (2), 255–272.

Renshaw, Arthur and Steven Haberman (2006). A cohort-based extension to the Lee-Cartermodel for mortality reduction factors. Insurance: Mathematics and Economics 38 (3),556–570.

Richards, Stephen J., James Kirkby, and Iain Currie (2006). The importance of year ofbirth in two-dimensional mortality data. British Actuarial Journal 12 (1), 5–38.

Thatcher, A.R., Vaino Kannisto, and James W. Vaupel (1998). The force of mortality atages 80 to 120. Odense, Denmark: Odense University Press.

Villegas, Andres M., Vladimir K. Kaishev, and Pietro Millossovich (2015). StMoMo: an R

package for stochastic mortality modelling. http://github.com/amvillegas/StMoMo.

Watson, Geoffrey S. (1964). Smooth regression analysis. Sankhya: The Indian Journal ofStatistics, Series A, 359–372.

Willets, R.C. (2004). The cohort effect: insights and explanations. British ActuarialJournal 10 (4), 833–877.

Wong-Fupuy, Carlos and Steven Haberman (2004). Projecting mortality trends: recentdevelopments in the United Kingdom and the United States. North American ActuarialJournal 8 (2), 56–83.

Wood, Simon N. (2003). Thin plate regression splines. Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) 65 (1), 95–114.

Yang, Sharon S., Jack C. Yue, and Hong-Chih Huang (2010). Modeling longevity risksusing a principal component approach: A comparison with existing stochastic mortalitymodels. Insurance: Mathematics and Economics 46 (1), 254–270.

50

Yu, K. and M.C. Jones (2004). Likelihood-based local linear estimation of the conditionalvariance function. Journal of the American Statistical Association 99 (465), 139–144.

51

6 Appendix

The following sections present the technical details required to reproduce the results in thisthesis. For readers wishing to replicate the results, the R code that is used to produce allthe results is available at http://www.sourceforge.net/projects/AnnuityPricing. For theLC, CBD M7 and HU models, there were existing implementations in R that I have usedin the process. The LC and CBD M7 methods are straightforward to implement using theStMoMo R package (Villegas et al., 2015). The HU method is straightforward to implementusing the Demography R package (Hyndman et al., 2017). To the best of my knowledge,there existed no public implementation of the 2D KS model prior to the code that is madeavailable alongside this thesis.

Technical Details for the Lee-Carter model

Evaluating Residual Patterns

When a stochastic mortality model is calibrated using Maximum Likelihood, it is appro-priate to evaluate residual patterns using the scaled deviance residuals. These are definedas:

rx,t = sign(dx,t − dx,t)

√dev(x, t)

φ, where φ =

D(dx,t, dx,t)

K − ν. (6)

In the above, dx,t are the observed death numbers and dx,t the expected number of deaths

under the calibrated model. Further, D(dx,t, dx,t) =∑

x

∑t dev(x, t) is the total deviance

of the model, K =∑

x

∑t is the number of observations used in the calibration sample

and ν is the effective number of parameters in the model. The implementation of theLee-Carter model that is used in this thesis assumes a Poisson death distribution. For aPoisson death distribution the deviances are defined as:

dev(x, t) = 2

[dx,tlog

(dx,tdx,t

)− (dx,t − dx,t)

].

Forecasting

The calibration of the LC model on the period 1961-2004 yields estimates αx, βx forx = 60, ..., 89 and κt for t = t0, ..., tn. In order to forecast mortality, the calibrated time in-dex is modeled and forecasted using an ARIMA process. Following Lee and Carter (1992)we model the calibrated period index κt as a univariate random walk with drift, that is,

κt = δ + κt−1 + εt, where εt ∼ N (0, σ2κ).

The drift δ and the variance σ2κ are estimated using OLS, following the approach described

in Haberman and Renshaw (2011). Letting tn = 2004 denote the last calendar year forwhich we have an estimated period index, successive substitution gives:

κtn+h = hδ + κtn + εtn+h−1 + · · ·+ εtn+1. (7)

52

The h-step ahead point forecast κtn+h is then obtained by taking the expectation of (7),that is, κtn+h = κtn + hδ. Similarly, simulated values {κtn+h,j}Bj=1 can be generated bysampling the errors εtn+1, ..., εtn+h independently from the N (0, σ2

κ) distribution and plug-ging these into (7).

To obtain simulated values for the h-step ahead forecast for the logarithms of mortal-ity rates log mx,tn+h, the simulated values κtn+h,j are plugged in the calibrated LC model:

mx,tn+h,j = αx + βxκtn+h,j.

The point forecast of the logarithm of the age-specific mortality rate in calendar year t isthen obtained by taking the the mean value of mx,tn+h,j. Similarly, prediction intervalscan be obtained by calculating appropriate percentile values.

Parameter uncertainty

Prediction intervals must also reflect the fact that the parameters αx, βx and κt are esti-mated rather than observed values. In order to properly reflect uncertainty in the forecasts,the uncertainty associated with these estimates must be taken into account. This can beachieved using the semiparametric bootstrap approach proposed by Brouhns et al. (2005).B bootstrap samples of death numbers dbx,t, b = 1, ..., B, are generated by sampling fromthe Poisson distribution with a mean equal to the observed death numbers dx,t. Each boot-strapped sample dbx,t is then used to re-calibrate the LC model to obtain B bootstrapped

parameter estimates αbx, βbx, κ

bt , b = 1, ..., B. To obtain prediction intervals for log mx,tn+h

that incorporate parameter uncertainty, the forecasting approach is applied to each of theB bootstrapped samples. Provided that the number of bootstrapped samples B is largeenough, one simulation path of the random walk with drift per bootstrap sample is enoughto appropriately reflect both the randomness in the random walk with drift process andthe uncertainty associated with the parameters αx, βx and κt.

53

Technical Details for the Cairns-Blake-Dowd M7 model

Evaluating Residual Patterns

Scaled deviance residuals as defined in (6) are used to evaluate residual patterns. Fora Binomial distribution of deaths, the deviances are defined as:

dev(x, t) = 2

[dx,tlog

(dx,tdx,t

)+ (IETRx,t − dx,t)log

(IETRx,t − dx,tIETRx,t − dx,t

)].

Forecasting

The forecasting approach for the CBD M7 model is quite similar to that of the LC model.The calibrated period indices κ

(1)t , κ

(2)t and κ

(3)t , t = t0, ..., tn are modeled as a multivariate

random walk with drift, which is also used by Cairns et al. (2006, 2011) and Haberman

and Renshaw (2011). Letting κt =(κ(1)t , κ

(2)t , κ

(3)t

)′, the following model for κt is estimated

using OLS:κt = ∆ + κt−1 + εt, where εt ∼ (0,Σ),

where ∆ is the three-dimensional drift term and Σ is the 3 × 3 covariance matrix of κt.Point forecasts κtn+h and simulated values {κtn+h}Bj=1 can be obtained by a straightfor-ward generalization of the forecasting approach described for the LC model, see Habermanand Renshaw (2011).

One of the main challenges when forecasting stochastic mortality models is specifyingthe dynamics of the cohort effect, as pointed out by Currie (2016). We assume that thecohort index γc follows an ARIMA process that is independent from the period indices κt.This simplifying assumption is also made in Renshaw and Haberman (2006) and Cairnset al. (2011). We follow Cairns et al. (2011) and model the calibrated cohort index γc asan AR(1) with a constant, so that:

γc = δc + αc(γc−1 − δc) + εc, where εc ∼ N (0, σ2c ).

To estimate, forecast and simulate from the ARIMA process, the R package forecast

(Hyndman and Khandakar (2008); Hyndman et al. (2017)) is used.

To obtain simulated values for the h-step ahead forecast for the logarithms of age-specificone-year death probabilities logit qx,tn+h, the simulated values κtn+h,j are plugged in thecalibrated CBD M7 model:

logit qx,tn+h,j = κ(1)tn+h,j

+ κ(2)tn+h,j

(x− x) + κ(3)tn+h,j

((x− x)2 − σ2

x

)+ γtn+h−x.

Point forecasts are obtained by taking the mean of logit qx,tn+h,j over the simulation samplesand prediction intervals can be obtained by calculating the appropriate percentiles.

54

Parameter uncertainty

To produce prediction intervals that appropriately incorporate the estimation uncertaintyof κt and γc, semiparametric bootstrapping is employed. The approach is similar to thebootstrapping approach employed for the LC model. Because the CBD M7 model assumesbinomial distributed death numbers, a suitable adaptation of the Brouhns et al. (2005)approach is used, see Villegas et al. (2015).

Alternative Calibration Period

The point forecasts at advanced ages displayed a very pronounced kink, see Figure 10.In the process of establishing whether I applied the model correctly, I calibrated the modelto the time period 1981-2004. The result is shown in Figure 16. It appears that theimplausible kink disappears when using the alternative calibration sample.

Figure 16: Point forecasts of mortality rates mx,t produced by the CBD M7 model, basedon the calibration periods 1961-2004 (black) and 1981-2004 (red).

55

Technical Details for the Hyndman-Ullah model

Evaluating Residual Patterns

For the HU method, the differences between the observed and fitted logarithms of mortal-ity rates rather than death numbers are used. The size of the residuals is incomparableto residuals based on mortality rates, due to the scale of the residuals. The residuals arenot standardized. The reason we do not standardize the residuals is that it is not directlyapparent how to standardize them with the standard output from the demography (Hyn-dman et al. (2017)) package.

Forecasting

After calibration we have the following decomposition:

log fx,t = θx +4∑

k=1

βt,kφx,k + ex,t. (8)

To produce forecasts, we need only specify time series dynamics for the βt,k coefficients,which are mutually uncorrelated by virtue of the calibration method. Therefore, the timeseries dynamics can be specified independently. To obtain forecasts for each of the coef-ficients {βt,1, ..., βt,4}, univariate robust ARIMA models (Chen and Liu (1993)) are used.The method of Chen and Liu (1993) allows the fitted ARIMA models to contain outliersof various types, so that these unusual observations do not contaminate the forecast.

The order of the ARIMA models is chosen on the basis of the AICc criterion using thedemography package (Hyndman and Khandakar (2008); Hyndman et al. (2017)). In prin-cipal, this R package allows the maximum order of the ARIMA models to be specified, butwe have chosen not to do so because it requires tinkering with the demography package(Hyndman et al. (2017)) used to apply the HU method. The ARIMA models that areselected based on the automatic selection procedure are shown in Table 8.

Coefficient Time series model

βt,1 ARIMA(2,2,1)

βt,2 Random walk without drift

βt,3 AR(1) without drift

βt,4 ARIMA(1,0,2) without drift

Table 8: ARIMA models specified for the coefficients {β1,k, ..., β4,k}.

56

Prediction intervals, simulated sample paths and parameter uncertainty

The smooth h-step ahead forecasts of log mx,t are then obtained using

log mx,t = θx +4∑

k=1

βtn+h,kφx,k, (9)

where βtn+h,k denotes the h-step ahead forecast of βtn+h,k using the estimated time series

{βt,1, ..., βt,4}, where tn is the last calendar year observed in the calibration sample.

The forecast variance follows from (8). Because each principal component is approxi-mately orthogonal to the other components, the forecast variance can be approximatedby the sum of component variances. The forecast variance accounts for parameter uncer-tainty, uncertainty stemming from the smoothing method and uncertainty stemming fromthe ARIMA forecasts, see Hyndman and Ullah (2007) for details. The authors assumethat the various sources of error are all normally distributed, which implies that predictionintervals can be constructed from the appropriate percentiles. Hyndman and Ullah (2007)note that they have not detected any non-normality in demographic applications, so thisassumption seems justified. Simulated future sample paths can be generated by drawingindependent observations from the normally distributed forecast variances and adding theshocks thus obtained to (9), in similar spirit to the LC and CBD M7 methods.

Technical Details for the 2D KS model

Evaluating Residual Patterns

For the 2D KS model we use the standardized residuals to evaluate patterns in the resid-uals. These are defined as:

log mx,t − Γ(x, t)√s2x,t

,

where s2x,t = Gx,t −(Γ(x, t)

)2, with:

Gx,t =

∑Ni=1

∑Tj=1Kh1,h2,ρ

(xi − x, tj − t)(

log(mx,t)2)∑N

i=1

∑Tj=1Kh1,h2,ρ

(xi − x, tj − t).

The size of the standardized residuals is incomparable to those produced by the othermortality models, but the residuals can still be evaluated in isolation.

Forecasting

Forecasting logarithms of age-specific mortality rates log mx,t is done by sequentially cal-culating one-step ahead forecasts. Let T be the number of calendar years used to calibratethe model and let tj = j

T+1, for j = 1, 2, ..., T + 1 with tT+1 = 1. The calendar years

are thus normalized to an equidistantly spaced interval on [ 1T+1

, 1]. The one-step ahead

57

forecast at age x and calendar year T + 1 is obtained by minimizing equation (3) usingthe newly defined tj and the calibrated bandwidth parameters h1, h2 and ρ based on theoriginal estimation sample. That is, the one-step ahead forecast at age x is obtained by:

Γx,tn+1 =

∑Ni=1

∑Tj=1K(xi−x0,tj−t0) log(mx,t)∑N

i=1

∑Tj=1Kh1,h2,ρ

(xi − x, tj − t0),

where tn is the last year used to calibrate the model. Repeating the forecasting procedureh− 1 times, the h-step ahead forecasts can be obtained.

Li et al. (2016) do not discuss prediction intervals. We have attempted to produce pre-diction intervals, but found that the forecasting method does not take into account thestochastic nature of future mortality. That is, given the estimation sample and the cal-ibrated smoothing parameters h1, h2 and ρ, forecasts are deterministic. Therefore, thisforecasting method produces implausibly narrow prediction intervals. For this reason, pre-diction intervals for the 2D KS method are not used throughout this thesis and we onlyuse the point forecasts Γx,tn+h.

Parameter uncertainty

We do not display prediction intervals for the 2D KS model throughout the thesis forthe above reason, but we found that it is possible to allow for parameter uncertainty.The Nadaraya-Watson estimator has an asymptotically normal distribution under the as-sumption of homoskedastic errors and some fairly general assumptions, see Li and Racine(2007). Intervals based on such an asymptotic result may be problematic in two respects.First, it is based on a Central Limit Theorem, which may not be a good finite-sampleapproximation. Second, it does not take into account the bias produced by any nonpara-metric regression method. The problem of bias may be diminished by employing methodsthat choose bandwidths suboptimally, leading to undersmoothing (see e.g., Hall (1993)and Neumann and Polzehl (1998)). Another approach is to circumvent the the selection ofsuboptimal bandwidths by using a bootstrap approach to provide bias-correction (see e.g.,Hardle and Bowman (1988), Yu and Jones (2004), Politis (2014)). We have implementedthe model-based bootstrap approach suggested by Politis (2014) to allow for parameteruncertainty. However, the results are not discussed in the thesis, because we have not beenable to implement a method to produce prediction intervals that appropriately reflect thestochastic nature of mortality.

58

Summary of Modelling Steps

1. Calibrate the LC, HU and 2D KS models to the observed logarithms of mortalityrates log mx,t for ages 60-89 and calendar years 1961-2004. Calibrate the CBD M7model to observed logit-transformed one-year death probabilities, for ages 60-89 andcalendar years 1961-2004.

2. For the LC, CBD M7 and HU models, use the time-series models to produce B =5, 000 simulated paths of age-specific mortality rates log mx,t for calendar years 2005-2100 and ages 60-89. The point forecast log mx,t is then obtained by taking the meanvalue at age x and calendar year t over all 5,000 simulated values. Prediction intervalsare obtained by taking the appropriate percentile values. For the 2D KS model, onlyproduce a point forecast using the original calibration sample. For the CBD M7model, use the time-series models to produce B = 5, 000 simulated sample pathsof age-specific one-year death probabilities log qx,t for calendar years 2005-2100 andages 60-89.

3. For all models and each calendar year and simulated sample path separately, extrap-olate the simulated mortality rates log mx,t using the Kannisto (1994) mortality law.For the CBD M7 model, first translate the simulated one-year death probabilities tomortality rates by using the identity mx,t = 1 − exp qx,t, then apply the Kannisto(1994) model, and finally translate back to one-year death probabilities. Point fore-casts and prediction intervals of log mx,t and log qx,t at ages 90-120 are obtained bytaking the mean and appropriate quantiles, after the extrapolation step.

4. For each simulation separately, estimate the annuity present value of ax,t using theappropriate forecast mortality rates up to age 120. This gives a set of 5,000 estimatesax,t, which form the density function of the annuity present value. The point forecast,mean, 90th and 95th percentile are measured over the 5,000 simulated sample paths.For the 2D KS model, only point forecasts of ax,t are calculated.

59