Probabilistic forecasting from ensemble prediction systems: Improving upon the best-member method by...

21
Q. J. R. Meteorol. Soc. (2006), 132, pp. 1349–1369 doi: 10.1256/qj.05.167 Probabilistic forecasting from ensemble prediction systems: Improving upon the best-member method by using a different weight and dressing kernel for each member By VINCENT FORTIN 1 †, ANNE-CATHERINE FAVRE 2 and M ´ ERIEM SA ¨ ID 2 1 Numerical Weather Prediction Research, Environment Canada, Dorval, Canada 2 NSERC/Hydro-Qu´ ebec Chair in Statistical Hydrology, INRS-ETE, Qu´ ebec City, Canada (Received 4 August 2005; revised 30 January 2006) SUMMARY Ensembles of meteorological forecasts can both provide more accurate long-term forecasts and help assess the uncertainty of these forecasts. No single method has however emerged to obtain large numbers of equiprobable scenarios from such ensembles. A simple resampling scheme, the ‘best member’ method, has recently been proposed to this effect: individual members of an ensemble are ‘dressed’ with error patterns drawn from a database of past errors made by the ‘best’ member of the ensemble at each time step. It has been shown that the best-member method can lead to both underdispersive and overdispersive ensembles. The error patterns can be rescaled so as to obtain ensembles which display the desired variance. However, this approach fails in cases where the undressed ensemble members are already overdispersive. Furthermore, we show in this paper that it can also lead to an overestimation of the probability of extreme events. We propose to overcome both difficulties by dressing and weighting each member differently, using a different error distribution for each order statistic of the ensemble. We show on a synthetic example and using an operational ensemble prediction system that this new method leads to improved probabilistic forecasts, when the undressed ensemble members are both underdispersive and overdispersive. KEYWORDS: Bayesian model averaging Dressing kernel Ensemble augmentation Overdispersive Underdispersive 1. I NTRODUCTION While atmospheric and hydrologic forecasts are certainly useful for a wide range of applications, the expected value of a forecast can be negative if the uncertainty of the forecast is not correctly taken into account (Davis et al. 1979). For complex decision-making problems, especially when the decision process has been formalized and parametrized so that it can be optimized by stochastic programming methods, the best way to carry information on the joint probability distribution of uncertain atmospheric or hydrologic forecasts is probably to provide the user with a large number of scenarios which are drawn from the probability distribution of the predictand, conditional on the forecasts. In practice, if this probability distribution is not provided by the forecasting system, it will be derived by the user from past experience with the forecast, which can lead him to misuse forecasts which vary in accuracy with time, such as precipitation forecasts or long-term temperature forecasts (Houdant 2004). It is therefore preferable that the forecasting system provide some information on the sharpness and reliability of the forecasts. Ensemble prediction systems (EPS) offer this possibility (Sivillo et al. 1997). EPS outputs can be very useful to assess the uncertainty of a deterministic forecast, i.e. there can be a relationship between the spread of the ensemble and the skill of the forecast, but this does not mean that the empirical distribution of the ensemble members provides a useful probabilistic forecast, both because the number of ensemble members is limited by computing resources, but also † Corresponding author: Numerical Weather Prediction Research, Environment Canada, Canadian Meteorological Centre, 2121 North Service Road, Trans-Canada Highway, Dorval (Qu´ ebec), Canada H9P 1J3. e-mail: [email protected] c Royal Meteorological Society, 2006. 1349

Transcript of Probabilistic forecasting from ensemble prediction systems: Improving upon the best-member method by...

Q J R Meteorol Soc (2006) 132 pp 1349ndash1369 doi 101256qj05167

Probabilistic forecasting from ensemble prediction systems Improving uponthe best-member method by using a different weight and dressing kernel for

each member

By VINCENT FORTIN1dagger ANNE-CATHERINE FAVRE2 and MERIEM SAID2

1Numerical Weather Prediction Research Environment Canada Dorval Canada2NSERCHydro-Quebec Chair in Statistical Hydrology INRS-ETE Quebec City Canada

(Received 4 August 2005 revised 30 January 2006)

SUMMARY

Ensembles of meteorological forecasts can both provide more accurate long-term forecasts and help assessthe uncertainty of these forecasts No single method has however emerged to obtain large numbers of equiprobablescenarios from such ensembles A simple resampling scheme the lsquobest memberrsquo method has recently beenproposed to this effect individual members of an ensemble are lsquodressedrsquo with error patterns drawn from a databaseof past errors made by the lsquobestrsquo member of the ensemble at each time step It has been shown that the best-membermethod can lead to both underdispersive and overdispersive ensembles The error patterns can be rescaled so as toobtain ensembles which display the desired variance However this approach fails in cases where the undressedensemble members are already overdispersive Furthermore we show in this paper that it can also lead to anoverestimation of the probability of extreme events We propose to overcome both difficulties by dressing andweighting each member differently using a different error distribution for each order statistic of the ensembleWe show on a synthetic example and using an operational ensemble prediction system that this new methodleads to improved probabilistic forecasts when the undressed ensemble members are both underdispersive andoverdispersive

KEYWORDS Bayesian model averaging Dressing kernel Ensemble augmentation OverdispersiveUnderdispersive

1 INTRODUCTION

While atmospheric and hydrologic forecasts are certainly useful for a wide rangeof applications the expected value of a forecast can be negative if the uncertaintyof the forecast is not correctly taken into account (Davis et al 1979) For complexdecision-making problems especially when the decision process has been formalizedand parametrized so that it can be optimized by stochastic programming methodsthe best way to carry information on the joint probability distribution of uncertainatmospheric or hydrologic forecasts is probably to provide the user with a large numberof scenarios which are drawn from the probability distribution of the predictandconditional on the forecasts In practice if this probability distribution is not providedby the forecasting system it will be derived by the user from past experience with theforecast which can lead him to misuse forecasts which vary in accuracy with timesuch as precipitation forecasts or long-term temperature forecasts (Houdant 2004) Itis therefore preferable that the forecasting system provide some information on thesharpness and reliability of the forecasts Ensemble prediction systems (EPS) offerthis possibility (Sivillo et al 1997) EPS outputs can be very useful to assess theuncertainty of a deterministic forecast ie there can be a relationship between the spreadof the ensemble and the skill of the forecast but this does not mean that the empiricaldistribution of the ensemble members provides a useful probabilistic forecast bothbecause the number of ensemble members is limited by computing resources but also

dagger Corresponding author Numerical Weather Prediction Research Environment Canada Canadian MeteorologicalCentre 2121 North Service Road Trans-Canada Highway Dorval (Quebec) Canada H9P 1J3e-mail vincentfortinecgccaccopy Royal Meteorological Society 2006

1349

1350 V FORTIN et al

because EPS forecasts are not necessarily reliable they can be biased and typically donot display enough variability thus leading to an underestimation of the uncertainty(Buizza et al 2005)

Different approaches have been proposed recently to build reliable probabilisticforecasts from such ensembles including Bayesian model averaging (Raftery et al2005) the Bayesian processor of output (Krzysztofowicz 2004) and the best-membermethod (Roulston and Smith 2003) which is by far the simplest to implement amongstthese three methods While we agree with Krzysztofowicz (2004) that Bayesian theoryprovides the appropriate theoretical framework for obtaining the probability distributionof a predictand conditional on an ensemble of model outputs we do feel that thereis at the moment a need for simpler methods which can be readily implementedRoulston and Smith (2003) hereafter RS03 have recently proposed the use of a simpleresampling scheme the lsquobest memberrsquo method individual members of an ensemble arelsquodressedrsquo with an error distribution derived from the error made by the lsquobestrsquo memberof the ensemble Wang and Bishop (2005) hereafter WB05 have however shown bystochastic simulations that the best-member method can lead both to underdispersiveand overdispersive ensembles They propose an improved method where the errordistribution is rescaled so as to obtain ensembles which display the desired varianceand apply it both to a synthetic EPS and a real EPS based on a dynamical atmosphericmodel This approach however fails in cases where the undressed ensemble membersare already overdispersive We propose to overcome this difficulty by dressing andweighting each member differently

While underdispersion is a common feature of an EPS overdispersion is alsoobserved for some variables (Feddersen and Andersen 2005) In particular if theperturbations in the analysis do not represent the observation error but are tuned so thatthe EPS exhibits the right amount of variability at some lead time for some variablesthen there is a real possibility that it will be overdispersive for longer lead times orother variables (Legg and Mylne 2004) It is also possible to observe overdispersion ina multimodel multianalysis EPS because individual ensemble members can producedrastically different results (Eckel 2003)

To demonstrate that dressing and weighting each member of an EPS differentlycan improve the reliability of the forecast we focus in this paper on the problemof univariate forecasting The proposed method could be adapted for multivariateforecasting problems and we will suggest strategies to that effect but a detailed analysisof all the necessary design choices is outside the scope of this paper Clearly in manycases the predictand is multidimensional However in a good number of these casesthis is because the variable of interest is really a combination of different meteorologicalelements at different spatial locations and lead times This is the case for example forhydrological forecasting where the inputs are certainly multidimensional but the outputis very often unidimensional

The paper is organized as follows in the next section we present the best-membermethod of RS03 and introduce the improved best-member method of WB05 In thethird section we present the synthetic example used by WB05 to illustrate their methodrecall their main experimental results and show that rescaling the error patterns leadsin this case to probabilistic forecasts having heavier tails than the observations In thefourth section we propose a different correction technique for the best-member methodwhich uses a different error distribution and a different weight for each member of theensemble We show that this method leads to forecasts which not only have the rightamount of variance both for underdispersive and overdispersive EPS but which alsohave better tail behaviour The method is illustrated in section 5 using outputs from the

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1351

GFS reforecast experiment (Hamill et al 2006) Results are discussed in section 6 anda brief conclusion follows

2 THE BEST-MEMBER METHOD

A numerical weather prediction system provides forecasts for different weatherelements on a grid at different lead times using some information at0 on the state of theatmosphere at the initial time t0 and a dynamical model of the atmosphere xt = g(at0)To assess the impact of uncertainty on the analysis at0 and on the dynamical modelstructure g(middot) it is possible to run different models with slightly different initial con-ditions Let yt = ytj j = 1 2 J be the vector of unknowns that the systemis forecasting at time t ie ytj corresponds to one weather element at one locationat a given lead time t (thus index j denotes one weather element at one location)Let xtk = xtkj be a forecast of y provided by the kth member of the EPS andXt = xtk k = 1 2 K denote the set of all ensemble members Our objective isto obtain a probabilistic prediction of yt given Xt ie the predictive distribution p(yt |Xt ) In practice we are in fact essentially interested by predictive simulations fromp(yt | Xt ) ie scenarios ym m = 1 2 M sampled from p(yt | Xt ) Further-more we would often like to be able to obtain many more scenarios than there areensemble members ie M K as we would like to be able to estimate the probabilityassociated with extreme events

RS03 have recently introduced the lsquobest-memberrsquo method to obtain scenarios whichappear to be sampled from p(yt | Xt ) The idea is to lsquodressrsquo each ensemble memberxtk with a probability distribution representing the error made by this member when ithappens to be the best member of the ensemble Amongst the ensemble members xtkdefine the best member xlowast

t as the one which minimizes yt minus xtk for some given norm middot

xlowastt = arg min

xtk

yt minus xtk (1)

For a given norm and a database of past forecasts build a probability distributionfrom realizations of εlowast

t = yt minus xlowastt Finally define a probabilistic forecast which consists

of a finite mixture of error distributions centred on each original member of theensemble

p(yt | Xt ) asymp 1

K

Ksum

k=1

pεlowast(yt minus xtk) (2)

Simulating from this distribution is simple to obtain M = N middot K simulations dresseach of the K members of the ensemble by sampling N error patterns εtkn n =1 2 N from a database of past forecasts and add each error pattern to thisensemble member

ytkn = xtk + εtkn (3)

We will refer to Xt = xtk as a dynamical ensemble since it is obtained froma dynamical model of the atmosphere and to Yt = ytkn as a statistical ensembleWB05 have shown that the best-member method does not lead to reliable forecastsReliability is generally a desirable feature for a probabilistic forecasting system If theforecasted system is stationary a probabilistic forecasting system is said to be reliableif we cannot tell apart a large set of observed values from a large set of forecasts Thisis obviously the case for a perfect forecast but also if the forecasts are simply issued

1352 V FORTIN et al

by drawing randomly from the stationary distribution of the observed process whichwould correspond to climatology for weather elements

Conditions under which the best-member method leads to reliable forecasts are notknown On the other hand one can think of many examples where it fails Consider forexample an EPS which provides forecasts having the right amount of variance but noskill Unless the number of members is so large that the best-member error is alwayszero the best-member error distribution will have a variance larger than zero As thebest-member method adds uncorrelated noise to each member it increases the varianceHence the statistical ensemble will necessarily have too much variance if the dynamicalensemble already had enough

(a) Rescaling the error patterns to obtain the right amount of varianceWB05 suggest that the error patterns be rescaled so that the covariance of any two

randomly picked dressed ensemble members is the same as the covariance of the obser-vations with any one of those two members This technique has two disadvantages firstit means that error patterns get distorted so that one is adding to the ensemble memberserror patterns that did not actually occur and which may be unfeasible Secondly thetechnique is only applicable when the dynamical ensemble is underdispersive Indeedas we still add uncorrelated noise to the dynamical ensemble we can only increase thevariance

In the conclusion of their paper WB05 mention that a possible solution to thesecond problem would be to weight each ensemble member differently We showin this paper that this can solve both problems The rationale for treating ensemblemembers differently even when all members of the ensemble are generated identicallyand independently is as follows when we dress an ensemble member we already knowthe outcome for each member We can thus use information from the other members inparticular where it lies in the ensemble measured for example by the distance fromthe ensemble mean using the norm already defined to identify the best member ofthe ensemble Consider an EPS which produces underdispersive dynamical ensemblesBecause they are underdispersive the outcome will often lie outside the ensembleconvex hull Hence a member which is on the outside of this hull will have more chanceof being the best member of the ensemble and thus can be given more weight A similarargument can be made for an EPS which produces overdispersive dynamical ensemblessmaller weights should be given to ensemble members which are further away from theensemble mean

3 A SYNTHETIC EPS SYSTEM

We shall reuse in this paper the simple synthetic EPS set-up used by WB05 toillustrate the improvements they have proposed to the best-member method Assumethat observations yt t = 1 2 T are independent normally distributed randomvariables with zero mean but time-dependent variance σ 2

t (y) Assume also that σ 2t (y)

is drawn from a chi-squared distribution with three degrees of freedom An EPSprovides a K-member forecast Xt = xtk k = 1 2 K for each observation yt All ensemble members are independent identically distributed (iid) normal variateshaving zero mean and time-dependent variance σ 2

t (x) where σ 2t (x) is related to σ 2

t (y)

by a random relationship σ 2t (x) = at middot σ 2

t (y) at being a uniform random variable onthe interval [μa minus 01 μa + 01] where μa is the expectation of at

Figure 1 illustrates the set-up using a directed acyclic graph (DAG) or Bayesiannetwork (Jensen 2001) In an acyclic graph circles represent unknown parameters and

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1353

a2(3)

t2(y) t

2(x)

Xtyt

a2(3)

t2(y) t

2(x)

Xtyt

(a) (b)

Figure 1 Directed acyclic graph representing the synthetic simulation set-up used by WB05 for (a) calibrationand (b) prediction

squares represent known parameters and observations Arrows represent dependencelinks Figure 1(a) presents the DAG for the calibration period during which the obser-vations yt are known (and thus represented by a square) whereas Fig 1(b) presents theDAG for the validation period during which the observations yt are unknown (and thusrepresented by a circle)

The EPS thus displays a lsquospread-skillrsquo relationship when the observation yt is lessvariable thus more predictable ie σ 2

t (y) happens to be small the observed varianceof the ensemble members will tend to be smaller and conversely The EPS will beunderdispersive if μa lt 1 and overdispersive if μa gt 1 Because the variable of interestis univariate and the forecasts are unbiased the best member xlowast

t of each ensemble issimply identified by minimizing the absolute difference between the observations andthe forecasts

(a) When and why the best-member method failsTo test the best-member method WB05 have used values of K from 1 to 20 and

values of μa from 01 to 09 As we want to illustrate a method which gives a differentweight to each ensemble member we will focus on EPS having between K = 3 andK = 20 members We will however consider values of μa from 01 to 19 as we will beable to deal with overdispersive EPS As proposed by WB05 each time we will obtain adatabase of past forecasts by generating T = 15 000 observations yt and correspondingensemble forecasts Xt The different methods will then be compared using statisticscomputed on statistical ensembles Yt = ytkn obtained by dressing a second set ofT = 15 000 ensemble forecasts Xt To obtain accurate statistics N = 150 statisticalensemble members will be drawn from each ensemble member xtk

The observations yt are drawn from an infinite mixture of normal distributionswith zero mean but different variances the variances being drawn from a chi-squaredwith ν = 3 degrees of freedom and thus having an expectation of E[σ 2

t (y)] = ν = 3It follows that the stationary distribution of the process yt has zero mean a varianceσ 2(y) of 3 (equal to the expectation of the chi-squared distribution) and a skewnessof zero Kurtosis as estimated by recursive adaptive Simpson quadrature using theMATLAB function quad (Gander and Gautschi 2000) is very close to β2 = 5 Recallthat kurtosis measures the degree of peakedness of a distribution that it is defined bythe ratio of the fourth centred moment to the square of the variance β2 = μ4σ

4 and

1354 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

05

05 0

60

6

06

07

07

07

08

08

08

09

09

09

11

11

11

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

17

17

17

18

18

18

19

19

19

2

22

12

2

23

1

1

1a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

12

2

2

20

00

0

0

0

00

a

K

(b) 2 ens - 2 obs

Figure 2 (a) Ratio of the variance of the statistical ensemble members generated by the original best-membermethod to the variance of the observations and (b) difference between the kurtosis of these ensemble membersand the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and of the size

of the dynamical ensemble (K)

that the kurtosis of the normal distribution is β2 = 3 (Weisstein 2002) Kurtosis excesswith respect to the normal distribution is defined by γ2 = β2 minus 3 A distribution witha high peak and heavier tails than the normal distribution is said to be leptokurticand has a positive kurtosis excess ie γ2 gt 0 A flat-topped curve with thin tails issaid to be platykurtic and has a negative kurtosis excess ie γ2 lt 0 The stationarydistribution of the process yt with a kurtosis excess of γ2 = 2 is leptokurtic and thushas heavier tails than a normal distribution The stationary distribution of the dynamicalensemble members Xt = xtk and best-member error εlowast

t also has zero mean and zeroskewness and therefore the stationary distribution of the statistical ensemble membersYt = ytkn will also have zero mean and zero skewness However their variance andkurtosis vary with K the number of members in the EPS and μa the expected amountof underdispersion or overdispersion of the undressed members

Figure 2(a) shows the ratio of the variance of the statistical ensemble membersgenerated by the best-member method to the variance of the observations as a functionof μa and K while Fig 2(b) shows the difference between the kurtosis of the ensemblemembers and the kurtosis of the observations as a function of μa and K The thick lineon both graphs shows the combinations of μa and K for which the statistical ensemblemembers have the correct variance and kurtosis

It can be seen in Fig 2(a) that the variance of the statistical ensemble increaseswith the variance of the dynamical ensemble and decreases with the ensemble sizeThese two characteristics are not specific to this experiment but rather much moregeneral Indeed the variance of the statistical ensemble can only be larger than that ofthe dynamical ensemble and as the number of ensemble members increase the distancebetween the best ensemble member and the observation will generally decrease towards

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1355

0 05 1 15 2-02

0

02

04

06

08

1

12

14

16

a

2 en

s -

2 ob

s

Figure 3 Difference between the kurtosis of the dynamical ensemble members and the kurtosis of the observa-tions as a function of the expected amount of underdispersion (μa )

zero so that the variance of the statistical ensemble will tend towards the variance of thedynamical ensemble for large ensemble sizes Hence the best-member method cannotprovide reliable forecasts in a general setting

It is more difficult to explain the behaviour of the kurtosis as a function of μa andK expressed by Fig 2(b) In this particular case the kurtosis of the statistical ensembledecreases with μa for μa lt 05 whereas for μa gt 05 it increases with K This peculiarbehaviour is caused by the fact that the dynamical ensemble members are not drawnfrom the same distribution as the observations and in particular do not have the samekurtosis Figure 3 shows the kurtosis of the EPS outputs as a function of μa It can beseen that for μa lt 05 the kurtosis of the dynamical ensemble members is already muchlarger than the kurtosis of the observations Hence the best-member method starts withensemble members which already have too heavy tails Unfortunately not only does thebest-member method fail to provide ensemble members with the correct variance butit generally increases the kurtosis which will lead to a systematic overestimation of theprobability of extreme events

(b) Rescaling the variance leads to the overestimation of the probability ofextreme events

WB05 having shown that the best-member method does not produce ensembleswith the correct variance propose to dress each dynamical ensemble member with anindependent error distribution having still a zero mean but a covariance chosen such thatthe covariance between any pair of ensemble members be the same as the covariancebetween one ensemble member and the observations For scalar observations theypropose the following estimator for the variance of the error distribution

s2 = mse(y x) minus (1 + 1K) middot s2x (4)

where mse(y x) = T minus1 sumTt=1(yt minus xt )

2 is the mean square error between the observa-tions yt and the ensemble mean xt = Kminus1 sumK

k=1 xkt computed from a database ofT error forecasts and observations If the dynamical ensemble members were alreadyoverdispersive then s2 will be negative so that the method of WB05 is not applicable

1356 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

11

1

2

2

2

3

3

3 3

4

4

4

4

5

5

5

6

0

00

0

a

K

(b) 2 ens - 2 obs

Figure 4 (a) Ratio of the variance of the statistical ensemble members generated by the modified best-membermethod of WB05 to the variance of the observations and (b) difference between the kurtosis of these ensemblemembers and the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and

of the size of the dynamical ensemble (K)

To dress individual ensemble members with an error distribution having a varianceequal to s2 one can simply multiply best-member errors εlowast

t drawn from an archive

of past forecasts by a factor ω =radic

s2s2εlowast where s2

εlowast = (T minus 1)minus1 sumTt=1(ε

lowastt )

2 is theestimated variance of the best-member error The statistical ensemble members willthen be given by

ytkn = xtk + ω middot εtkn (5)

where the innovations εtkn are drawn at random from a database of past error fore-casts The parameter ω will be larger than one if the best-member method producedunderdispersive ensemble members and smaller than one otherwise

Figure 4(a) shows that the variance produced by the modified best-member methodof WB05 is very close to the variance of the observations departures being likelycaused by sampling uncertainty and roundoff errors Figure 4(b) shows however thatthis improvement in the variance of the dressed ensemble members comes at the costof an increased kurtosis Hence the method proposed by WB05 leads to predictivedistributions which have much heavier tails than the observations which will lead toan overestimation of the probability of extreme events

4 DRESSING AND WEIGHTING EACH MEMBER DIFFERENTLY THE WEIGHTED MEMBERSMETHOD

In the best-member method each ensemble member is dressed using the sameerror distribution This seems to make sense if all ensemble members are exchangeableprior to their observation as there is no a priori reason for assuming that any ensemblemember has any more chance of being the best member of the ensemble or that its

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1357

error distribution should be any different from that of the other members if it indeedhappens to be the best member Recall that a set of random variables u = u1 uKare exchangeable if their joint probability density function p(u) does not depend on theirorder ie if for any permutation uprime of the elements of u we have p(u) = p(uprime) (Lindleyand Novick 1981) In other words the forecasts can be reordered and renumberedwithout losing any information on the predictand However we must not forget that thevalues taken by all ensemble members are known at the time an ensemble member getsdressed with an error distribution Hence member dressing is performed a posterioriwhen the ensemble members are not exchangeable any more

Consider for example an EPS which produces underdispersive dynamical ensem-bles Because they are underdispersive the outcome will often lie outside the spread ofthe ensemble Hence the extrema of the ensemble will have much more chance of beingthe best member of the ensemble than a member which is close to the ensemble meanConversely if the EPS is highly overdispersive members close to the ensemble meanwill have more chance of being the best members of the ensemble than the extrema ofthe ensemble Furthermore if an ensemble member close to a mode of the ensembleempirical distribution happens to be the best member of the ensemble the error madewill tend to be small as it must be smaller than half the distance to the closest ensemblemember otherwise it would not be the best member of the ensemble

Hence the probability that an ensemble member be the best member of theensemble as well as the error distribution of the best member of the ensemble can bothdepend on the location of the ensemble member within the ensemble For multivariateforecasts this location can be measured for example by the distance to the ensemblemean using the norm selected to identify the best member of the ensemble Forunivariate forecasts which are the focus of this paper a simpler solution exists wecan simply sort the ensemble members and take into account the rank of a member inthe sorted ensemble when dressing it with an error distribution

(a) Dressing each ensemble member with a different kernelGoing back to the simulations performed with the synthetic EPS system we can

sort the dynamical ensemble forecasts and estimate for ranked ensemble members theprobability of being the best member of the ensemble as well as the mean and varianceof the error distribution Define xt(k) to be the kth order statistic of an ensemble Xt =xtk k = 1 2 K and εlowast

(k)= yt minus xlowast

t | xlowastt = xt(k) t = 1 2 T to be the

best-member errors observed in the database of past forecasts when the best memberwas the kth order statistic Define also pk to be the probability that the best member isxt(k) ie pk = Pr[xlowast

t = xt(k)]Figure 5 shows pk as a function of k for K = 20 ensemble members for μa = 03

(an underdispersive ensemble) and for μa = 17 (an overdispersive ensemble) It canbe seen that pk depends on k as well as on μa as predicted the lowest and highestensemble members have higher (lower) probability of being the best member of anunderdispersive (overdispersive) ensemble Figure 6 presents the mean and standarddeviation of εlowast

(k) again for μa = 03 and μa = 17 It can be seen that the error

distribution is biased when the best member is one of the lowest or one of the highestmembers of the ensemble and that the standard deviation of the error is larger in thesecases Hence it does not make a lot of sense to dress each ensemble member with thesame error distribution

When dressing the kth member in the ordered ensemble we therefore propose thatinstead of resampling from the archive of all best-member errors we instead resample

1358 V FORTIN et al

0 5 10 15 200

005

01

015

02P

roba

bilit

y

k0 5 10 15 20

0

002

004

006

008

01

k

Pro

bab

ility

(a ) a=03 (b) a=17

Figure 5 Probability that the member of rank k is the best member of the ordered ensemble for (a) μa = 03and (b) μa = 17

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

Mean Standard deviation

(a) a=03 (b) a=17

Figure 6 Mean and standard deviation of the best-member error distribution as a function of the rank k of themember in the ordered ensemble for (a) μa = 03 and (b) μa = 17

from εlowast(k)

to obtain dressed ensemble members

ytkn = xt(k) + ω middot εt(k)n (6)

where εt(k)n is drawn at random from εlowast(k)

(b) Giving a different weight to each dynamical ensemble memberThe number of dressed members generated from the kth order statistic should also

reflect the probability that this particular member is the best member of the ensemble Ifwe want to obtain M ensemble members from the original K-member ensemble thena possibility would be to draw Nk = pk middot M dressed ensemble members from xt(k)However as we are still not drawing the statistical members Yt from the conditionaldistribution p(yt | Xt ) we would not be guaranteed to obtain statistical members whichdisplay the desired amount of variance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1359

An alternative is to optimize the number of dressed ensemble members drawn fromeach dynamical member so as to get the correct variance Let wk be the proportion ofthe statistical members that are generated from the kth order statistic of each dynamicalensemble If we build an archive by mixing statistical ensemble members from differentforecasts each statistical ensemble member y will have been obtained as the sum ofthe kth order statistic of a dynamical ensemble xt(k) and an independent error termεt(k)n resampled from εlowast

(k) Let μ(x(k)) and σ 2(x(k)) be respectively the mean and

variance of xt(k) and μ(ε(k)) and σ 2(ε(k)) be the mean and variance of εt(k)n Denoteby y(k) a statistical ensemble member which has been drawn as the sum of a kth orderstatistic and an error term It follows that the mean and variance of y(k) are respectivelygiven by μ(y(k)) = μ(x(k)) + μ(ε(k)) and σ 2(y(k)) = σ 2(x(k)) + σ 2(ε(k)) If the numberof forecasts in the archive is large any given ensemble member y can then be consideredto have been drawn from a mixture of K distributions p(y(k)) where the probability ofeach distribution corresponds to the proportion of members in the archive which havebeen drawn from the kth order statistic of a dynamical ensemble wk

p(y ) =K

sum

k=1

wk middot p(y(k)) (7)

It can be shown that the variance σ 2(y ) of this finite mixture is given by

σ 2(y ) =K

sum

k=1

wk(σ2(x(k)) + σ 2(ε(k)) + (μ(x(k)) + μ(ε(k)))

2)

minus( K

sum

k=1

wk(μ(x(k)) + μ(ε(k)))

)2

(8)

Given estimates of μ(x(k)) μ(ε(k)) σ 2(x(k)) and σ 2(ε(k)) respectively denoted byx(k) ε(k) s2

x(k)and s2

ε(k) we can hence obtain an estimate s2

y of σ 2(y ) for a given setof weights wk In order to have statistical ensemble members which have on averagethe correct variance we would like to find weights wk for which s2

y = s2y the observed

variance of the observations on the same period

s2y =

Ksum

k=1

wk(s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2) minus

( Ksum

k=1

wk(x(k) + ε(k))

)2

(9)

There can obviously be multiple solutions to this equation or no solution for thatmatter We propose to constrain the problem further by taking into account the observedshape of pk the probability that the best member is the kth order statistic as a functionof k For underdispersive dynamical ensembles the function pk is U-shaped whereasit is bell-shaped for overdispersive ensembles A simple parametric function which canbe used to approximate pk is the beta probability density function We use here theparametrization proposed by Walley (1996)

fB(x ω τ ) = xωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1

B(ω middot τ ω minus ω middot τ ) ω gt 0 0 τ 1 (10)

where B(α β) = int 10 xαminus1(1 minus x)βminus1 dx is the beta function (Weisstein 2002) While

being defined by only two parameters it can be either symmetric (when τ = 05)

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

1350 V FORTIN et al

because EPS forecasts are not necessarily reliable they can be biased and typically donot display enough variability thus leading to an underestimation of the uncertainty(Buizza et al 2005)

Different approaches have been proposed recently to build reliable probabilisticforecasts from such ensembles including Bayesian model averaging (Raftery et al2005) the Bayesian processor of output (Krzysztofowicz 2004) and the best-membermethod (Roulston and Smith 2003) which is by far the simplest to implement amongstthese three methods While we agree with Krzysztofowicz (2004) that Bayesian theoryprovides the appropriate theoretical framework for obtaining the probability distributionof a predictand conditional on an ensemble of model outputs we do feel that thereis at the moment a need for simpler methods which can be readily implementedRoulston and Smith (2003) hereafter RS03 have recently proposed the use of a simpleresampling scheme the lsquobest memberrsquo method individual members of an ensemble arelsquodressedrsquo with an error distribution derived from the error made by the lsquobestrsquo memberof the ensemble Wang and Bishop (2005) hereafter WB05 have however shown bystochastic simulations that the best-member method can lead both to underdispersiveand overdispersive ensembles They propose an improved method where the errordistribution is rescaled so as to obtain ensembles which display the desired varianceand apply it both to a synthetic EPS and a real EPS based on a dynamical atmosphericmodel This approach however fails in cases where the undressed ensemble membersare already overdispersive We propose to overcome this difficulty by dressing andweighting each member differently

While underdispersion is a common feature of an EPS overdispersion is alsoobserved for some variables (Feddersen and Andersen 2005) In particular if theperturbations in the analysis do not represent the observation error but are tuned so thatthe EPS exhibits the right amount of variability at some lead time for some variablesthen there is a real possibility that it will be overdispersive for longer lead times orother variables (Legg and Mylne 2004) It is also possible to observe overdispersion ina multimodel multianalysis EPS because individual ensemble members can producedrastically different results (Eckel 2003)

To demonstrate that dressing and weighting each member of an EPS differentlycan improve the reliability of the forecast we focus in this paper on the problemof univariate forecasting The proposed method could be adapted for multivariateforecasting problems and we will suggest strategies to that effect but a detailed analysisof all the necessary design choices is outside the scope of this paper Clearly in manycases the predictand is multidimensional However in a good number of these casesthis is because the variable of interest is really a combination of different meteorologicalelements at different spatial locations and lead times This is the case for example forhydrological forecasting where the inputs are certainly multidimensional but the outputis very often unidimensional

The paper is organized as follows in the next section we present the best-membermethod of RS03 and introduce the improved best-member method of WB05 In thethird section we present the synthetic example used by WB05 to illustrate their methodrecall their main experimental results and show that rescaling the error patterns leadsin this case to probabilistic forecasts having heavier tails than the observations In thefourth section we propose a different correction technique for the best-member methodwhich uses a different error distribution and a different weight for each member of theensemble We show that this method leads to forecasts which not only have the rightamount of variance both for underdispersive and overdispersive EPS but which alsohave better tail behaviour The method is illustrated in section 5 using outputs from the

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1351

GFS reforecast experiment (Hamill et al 2006) Results are discussed in section 6 anda brief conclusion follows

2 THE BEST-MEMBER METHOD

A numerical weather prediction system provides forecasts for different weatherelements on a grid at different lead times using some information at0 on the state of theatmosphere at the initial time t0 and a dynamical model of the atmosphere xt = g(at0)To assess the impact of uncertainty on the analysis at0 and on the dynamical modelstructure g(middot) it is possible to run different models with slightly different initial con-ditions Let yt = ytj j = 1 2 J be the vector of unknowns that the systemis forecasting at time t ie ytj corresponds to one weather element at one locationat a given lead time t (thus index j denotes one weather element at one location)Let xtk = xtkj be a forecast of y provided by the kth member of the EPS andXt = xtk k = 1 2 K denote the set of all ensemble members Our objective isto obtain a probabilistic prediction of yt given Xt ie the predictive distribution p(yt |Xt ) In practice we are in fact essentially interested by predictive simulations fromp(yt | Xt ) ie scenarios ym m = 1 2 M sampled from p(yt | Xt ) Further-more we would often like to be able to obtain many more scenarios than there areensemble members ie M K as we would like to be able to estimate the probabilityassociated with extreme events

RS03 have recently introduced the lsquobest-memberrsquo method to obtain scenarios whichappear to be sampled from p(yt | Xt ) The idea is to lsquodressrsquo each ensemble memberxtk with a probability distribution representing the error made by this member when ithappens to be the best member of the ensemble Amongst the ensemble members xtkdefine the best member xlowast

t as the one which minimizes yt minus xtk for some given norm middot

xlowastt = arg min

xtk

yt minus xtk (1)

For a given norm and a database of past forecasts build a probability distributionfrom realizations of εlowast

t = yt minus xlowastt Finally define a probabilistic forecast which consists

of a finite mixture of error distributions centred on each original member of theensemble

p(yt | Xt ) asymp 1

K

Ksum

k=1

pεlowast(yt minus xtk) (2)

Simulating from this distribution is simple to obtain M = N middot K simulations dresseach of the K members of the ensemble by sampling N error patterns εtkn n =1 2 N from a database of past forecasts and add each error pattern to thisensemble member

ytkn = xtk + εtkn (3)

We will refer to Xt = xtk as a dynamical ensemble since it is obtained froma dynamical model of the atmosphere and to Yt = ytkn as a statistical ensembleWB05 have shown that the best-member method does not lead to reliable forecastsReliability is generally a desirable feature for a probabilistic forecasting system If theforecasted system is stationary a probabilistic forecasting system is said to be reliableif we cannot tell apart a large set of observed values from a large set of forecasts Thisis obviously the case for a perfect forecast but also if the forecasts are simply issued

1352 V FORTIN et al

by drawing randomly from the stationary distribution of the observed process whichwould correspond to climatology for weather elements

Conditions under which the best-member method leads to reliable forecasts are notknown On the other hand one can think of many examples where it fails Consider forexample an EPS which provides forecasts having the right amount of variance but noskill Unless the number of members is so large that the best-member error is alwayszero the best-member error distribution will have a variance larger than zero As thebest-member method adds uncorrelated noise to each member it increases the varianceHence the statistical ensemble will necessarily have too much variance if the dynamicalensemble already had enough

(a) Rescaling the error patterns to obtain the right amount of varianceWB05 suggest that the error patterns be rescaled so that the covariance of any two

randomly picked dressed ensemble members is the same as the covariance of the obser-vations with any one of those two members This technique has two disadvantages firstit means that error patterns get distorted so that one is adding to the ensemble memberserror patterns that did not actually occur and which may be unfeasible Secondly thetechnique is only applicable when the dynamical ensemble is underdispersive Indeedas we still add uncorrelated noise to the dynamical ensemble we can only increase thevariance

In the conclusion of their paper WB05 mention that a possible solution to thesecond problem would be to weight each ensemble member differently We showin this paper that this can solve both problems The rationale for treating ensemblemembers differently even when all members of the ensemble are generated identicallyand independently is as follows when we dress an ensemble member we already knowthe outcome for each member We can thus use information from the other members inparticular where it lies in the ensemble measured for example by the distance fromthe ensemble mean using the norm already defined to identify the best member ofthe ensemble Consider an EPS which produces underdispersive dynamical ensemblesBecause they are underdispersive the outcome will often lie outside the ensembleconvex hull Hence a member which is on the outside of this hull will have more chanceof being the best member of the ensemble and thus can be given more weight A similarargument can be made for an EPS which produces overdispersive dynamical ensemblessmaller weights should be given to ensemble members which are further away from theensemble mean

3 A SYNTHETIC EPS SYSTEM

We shall reuse in this paper the simple synthetic EPS set-up used by WB05 toillustrate the improvements they have proposed to the best-member method Assumethat observations yt t = 1 2 T are independent normally distributed randomvariables with zero mean but time-dependent variance σ 2

t (y) Assume also that σ 2t (y)

is drawn from a chi-squared distribution with three degrees of freedom An EPSprovides a K-member forecast Xt = xtk k = 1 2 K for each observation yt All ensemble members are independent identically distributed (iid) normal variateshaving zero mean and time-dependent variance σ 2

t (x) where σ 2t (x) is related to σ 2

t (y)

by a random relationship σ 2t (x) = at middot σ 2

t (y) at being a uniform random variable onthe interval [μa minus 01 μa + 01] where μa is the expectation of at

Figure 1 illustrates the set-up using a directed acyclic graph (DAG) or Bayesiannetwork (Jensen 2001) In an acyclic graph circles represent unknown parameters and

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1353

a2(3)

t2(y) t

2(x)

Xtyt

a2(3)

t2(y) t

2(x)

Xtyt

(a) (b)

Figure 1 Directed acyclic graph representing the synthetic simulation set-up used by WB05 for (a) calibrationand (b) prediction

squares represent known parameters and observations Arrows represent dependencelinks Figure 1(a) presents the DAG for the calibration period during which the obser-vations yt are known (and thus represented by a square) whereas Fig 1(b) presents theDAG for the validation period during which the observations yt are unknown (and thusrepresented by a circle)

The EPS thus displays a lsquospread-skillrsquo relationship when the observation yt is lessvariable thus more predictable ie σ 2

t (y) happens to be small the observed varianceof the ensemble members will tend to be smaller and conversely The EPS will beunderdispersive if μa lt 1 and overdispersive if μa gt 1 Because the variable of interestis univariate and the forecasts are unbiased the best member xlowast

t of each ensemble issimply identified by minimizing the absolute difference between the observations andthe forecasts

(a) When and why the best-member method failsTo test the best-member method WB05 have used values of K from 1 to 20 and

values of μa from 01 to 09 As we want to illustrate a method which gives a differentweight to each ensemble member we will focus on EPS having between K = 3 andK = 20 members We will however consider values of μa from 01 to 19 as we will beable to deal with overdispersive EPS As proposed by WB05 each time we will obtain adatabase of past forecasts by generating T = 15 000 observations yt and correspondingensemble forecasts Xt The different methods will then be compared using statisticscomputed on statistical ensembles Yt = ytkn obtained by dressing a second set ofT = 15 000 ensemble forecasts Xt To obtain accurate statistics N = 150 statisticalensemble members will be drawn from each ensemble member xtk

The observations yt are drawn from an infinite mixture of normal distributionswith zero mean but different variances the variances being drawn from a chi-squaredwith ν = 3 degrees of freedom and thus having an expectation of E[σ 2

t (y)] = ν = 3It follows that the stationary distribution of the process yt has zero mean a varianceσ 2(y) of 3 (equal to the expectation of the chi-squared distribution) and a skewnessof zero Kurtosis as estimated by recursive adaptive Simpson quadrature using theMATLAB function quad (Gander and Gautschi 2000) is very close to β2 = 5 Recallthat kurtosis measures the degree of peakedness of a distribution that it is defined bythe ratio of the fourth centred moment to the square of the variance β2 = μ4σ

4 and

1354 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

05

05 0

60

6

06

07

07

07

08

08

08

09

09

09

11

11

11

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

17

17

17

18

18

18

19

19

19

2

22

12

2

23

1

1

1a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

12

2

2

20

00

0

0

0

00

a

K

(b) 2 ens - 2 obs

Figure 2 (a) Ratio of the variance of the statistical ensemble members generated by the original best-membermethod to the variance of the observations and (b) difference between the kurtosis of these ensemble membersand the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and of the size

of the dynamical ensemble (K)

that the kurtosis of the normal distribution is β2 = 3 (Weisstein 2002) Kurtosis excesswith respect to the normal distribution is defined by γ2 = β2 minus 3 A distribution witha high peak and heavier tails than the normal distribution is said to be leptokurticand has a positive kurtosis excess ie γ2 gt 0 A flat-topped curve with thin tails issaid to be platykurtic and has a negative kurtosis excess ie γ2 lt 0 The stationarydistribution of the process yt with a kurtosis excess of γ2 = 2 is leptokurtic and thushas heavier tails than a normal distribution The stationary distribution of the dynamicalensemble members Xt = xtk and best-member error εlowast

t also has zero mean and zeroskewness and therefore the stationary distribution of the statistical ensemble membersYt = ytkn will also have zero mean and zero skewness However their variance andkurtosis vary with K the number of members in the EPS and μa the expected amountof underdispersion or overdispersion of the undressed members

Figure 2(a) shows the ratio of the variance of the statistical ensemble membersgenerated by the best-member method to the variance of the observations as a functionof μa and K while Fig 2(b) shows the difference between the kurtosis of the ensemblemembers and the kurtosis of the observations as a function of μa and K The thick lineon both graphs shows the combinations of μa and K for which the statistical ensemblemembers have the correct variance and kurtosis

It can be seen in Fig 2(a) that the variance of the statistical ensemble increaseswith the variance of the dynamical ensemble and decreases with the ensemble sizeThese two characteristics are not specific to this experiment but rather much moregeneral Indeed the variance of the statistical ensemble can only be larger than that ofthe dynamical ensemble and as the number of ensemble members increase the distancebetween the best ensemble member and the observation will generally decrease towards

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1355

0 05 1 15 2-02

0

02

04

06

08

1

12

14

16

a

2 en

s -

2 ob

s

Figure 3 Difference between the kurtosis of the dynamical ensemble members and the kurtosis of the observa-tions as a function of the expected amount of underdispersion (μa )

zero so that the variance of the statistical ensemble will tend towards the variance of thedynamical ensemble for large ensemble sizes Hence the best-member method cannotprovide reliable forecasts in a general setting

It is more difficult to explain the behaviour of the kurtosis as a function of μa andK expressed by Fig 2(b) In this particular case the kurtosis of the statistical ensembledecreases with μa for μa lt 05 whereas for μa gt 05 it increases with K This peculiarbehaviour is caused by the fact that the dynamical ensemble members are not drawnfrom the same distribution as the observations and in particular do not have the samekurtosis Figure 3 shows the kurtosis of the EPS outputs as a function of μa It can beseen that for μa lt 05 the kurtosis of the dynamical ensemble members is already muchlarger than the kurtosis of the observations Hence the best-member method starts withensemble members which already have too heavy tails Unfortunately not only does thebest-member method fail to provide ensemble members with the correct variance butit generally increases the kurtosis which will lead to a systematic overestimation of theprobability of extreme events

(b) Rescaling the variance leads to the overestimation of the probability ofextreme events

WB05 having shown that the best-member method does not produce ensembleswith the correct variance propose to dress each dynamical ensemble member with anindependent error distribution having still a zero mean but a covariance chosen such thatthe covariance between any pair of ensemble members be the same as the covariancebetween one ensemble member and the observations For scalar observations theypropose the following estimator for the variance of the error distribution

s2 = mse(y x) minus (1 + 1K) middot s2x (4)

where mse(y x) = T minus1 sumTt=1(yt minus xt )

2 is the mean square error between the observa-tions yt and the ensemble mean xt = Kminus1 sumK

k=1 xkt computed from a database ofT error forecasts and observations If the dynamical ensemble members were alreadyoverdispersive then s2 will be negative so that the method of WB05 is not applicable

1356 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

11

1

2

2

2

3

3

3 3

4

4

4

4

5

5

5

6

0

00

0

a

K

(b) 2 ens - 2 obs

Figure 4 (a) Ratio of the variance of the statistical ensemble members generated by the modified best-membermethod of WB05 to the variance of the observations and (b) difference between the kurtosis of these ensemblemembers and the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and

of the size of the dynamical ensemble (K)

To dress individual ensemble members with an error distribution having a varianceequal to s2 one can simply multiply best-member errors εlowast

t drawn from an archive

of past forecasts by a factor ω =radic

s2s2εlowast where s2

εlowast = (T minus 1)minus1 sumTt=1(ε

lowastt )

2 is theestimated variance of the best-member error The statistical ensemble members willthen be given by

ytkn = xtk + ω middot εtkn (5)

where the innovations εtkn are drawn at random from a database of past error fore-casts The parameter ω will be larger than one if the best-member method producedunderdispersive ensemble members and smaller than one otherwise

Figure 4(a) shows that the variance produced by the modified best-member methodof WB05 is very close to the variance of the observations departures being likelycaused by sampling uncertainty and roundoff errors Figure 4(b) shows however thatthis improvement in the variance of the dressed ensemble members comes at the costof an increased kurtosis Hence the method proposed by WB05 leads to predictivedistributions which have much heavier tails than the observations which will lead toan overestimation of the probability of extreme events

4 DRESSING AND WEIGHTING EACH MEMBER DIFFERENTLY THE WEIGHTED MEMBERSMETHOD

In the best-member method each ensemble member is dressed using the sameerror distribution This seems to make sense if all ensemble members are exchangeableprior to their observation as there is no a priori reason for assuming that any ensemblemember has any more chance of being the best member of the ensemble or that its

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1357

error distribution should be any different from that of the other members if it indeedhappens to be the best member Recall that a set of random variables u = u1 uKare exchangeable if their joint probability density function p(u) does not depend on theirorder ie if for any permutation uprime of the elements of u we have p(u) = p(uprime) (Lindleyand Novick 1981) In other words the forecasts can be reordered and renumberedwithout losing any information on the predictand However we must not forget that thevalues taken by all ensemble members are known at the time an ensemble member getsdressed with an error distribution Hence member dressing is performed a posterioriwhen the ensemble members are not exchangeable any more

Consider for example an EPS which produces underdispersive dynamical ensem-bles Because they are underdispersive the outcome will often lie outside the spread ofthe ensemble Hence the extrema of the ensemble will have much more chance of beingthe best member of the ensemble than a member which is close to the ensemble meanConversely if the EPS is highly overdispersive members close to the ensemble meanwill have more chance of being the best members of the ensemble than the extrema ofthe ensemble Furthermore if an ensemble member close to a mode of the ensembleempirical distribution happens to be the best member of the ensemble the error madewill tend to be small as it must be smaller than half the distance to the closest ensemblemember otherwise it would not be the best member of the ensemble

Hence the probability that an ensemble member be the best member of theensemble as well as the error distribution of the best member of the ensemble can bothdepend on the location of the ensemble member within the ensemble For multivariateforecasts this location can be measured for example by the distance to the ensemblemean using the norm selected to identify the best member of the ensemble Forunivariate forecasts which are the focus of this paper a simpler solution exists wecan simply sort the ensemble members and take into account the rank of a member inthe sorted ensemble when dressing it with an error distribution

(a) Dressing each ensemble member with a different kernelGoing back to the simulations performed with the synthetic EPS system we can

sort the dynamical ensemble forecasts and estimate for ranked ensemble members theprobability of being the best member of the ensemble as well as the mean and varianceof the error distribution Define xt(k) to be the kth order statistic of an ensemble Xt =xtk k = 1 2 K and εlowast

(k)= yt minus xlowast

t | xlowastt = xt(k) t = 1 2 T to be the

best-member errors observed in the database of past forecasts when the best memberwas the kth order statistic Define also pk to be the probability that the best member isxt(k) ie pk = Pr[xlowast

t = xt(k)]Figure 5 shows pk as a function of k for K = 20 ensemble members for μa = 03

(an underdispersive ensemble) and for μa = 17 (an overdispersive ensemble) It canbe seen that pk depends on k as well as on μa as predicted the lowest and highestensemble members have higher (lower) probability of being the best member of anunderdispersive (overdispersive) ensemble Figure 6 presents the mean and standarddeviation of εlowast

(k) again for μa = 03 and μa = 17 It can be seen that the error

distribution is biased when the best member is one of the lowest or one of the highestmembers of the ensemble and that the standard deviation of the error is larger in thesecases Hence it does not make a lot of sense to dress each ensemble member with thesame error distribution

When dressing the kth member in the ordered ensemble we therefore propose thatinstead of resampling from the archive of all best-member errors we instead resample

1358 V FORTIN et al

0 5 10 15 200

005

01

015

02P

roba

bilit

y

k0 5 10 15 20

0

002

004

006

008

01

k

Pro

bab

ility

(a ) a=03 (b) a=17

Figure 5 Probability that the member of rank k is the best member of the ordered ensemble for (a) μa = 03and (b) μa = 17

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

Mean Standard deviation

(a) a=03 (b) a=17

Figure 6 Mean and standard deviation of the best-member error distribution as a function of the rank k of themember in the ordered ensemble for (a) μa = 03 and (b) μa = 17

from εlowast(k)

to obtain dressed ensemble members

ytkn = xt(k) + ω middot εt(k)n (6)

where εt(k)n is drawn at random from εlowast(k)

(b) Giving a different weight to each dynamical ensemble memberThe number of dressed members generated from the kth order statistic should also

reflect the probability that this particular member is the best member of the ensemble Ifwe want to obtain M ensemble members from the original K-member ensemble thena possibility would be to draw Nk = pk middot M dressed ensemble members from xt(k)However as we are still not drawing the statistical members Yt from the conditionaldistribution p(yt | Xt ) we would not be guaranteed to obtain statistical members whichdisplay the desired amount of variance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1359

An alternative is to optimize the number of dressed ensemble members drawn fromeach dynamical member so as to get the correct variance Let wk be the proportion ofthe statistical members that are generated from the kth order statistic of each dynamicalensemble If we build an archive by mixing statistical ensemble members from differentforecasts each statistical ensemble member y will have been obtained as the sum ofthe kth order statistic of a dynamical ensemble xt(k) and an independent error termεt(k)n resampled from εlowast

(k) Let μ(x(k)) and σ 2(x(k)) be respectively the mean and

variance of xt(k) and μ(ε(k)) and σ 2(ε(k)) be the mean and variance of εt(k)n Denoteby y(k) a statistical ensemble member which has been drawn as the sum of a kth orderstatistic and an error term It follows that the mean and variance of y(k) are respectivelygiven by μ(y(k)) = μ(x(k)) + μ(ε(k)) and σ 2(y(k)) = σ 2(x(k)) + σ 2(ε(k)) If the numberof forecasts in the archive is large any given ensemble member y can then be consideredto have been drawn from a mixture of K distributions p(y(k)) where the probability ofeach distribution corresponds to the proportion of members in the archive which havebeen drawn from the kth order statistic of a dynamical ensemble wk

p(y ) =K

sum

k=1

wk middot p(y(k)) (7)

It can be shown that the variance σ 2(y ) of this finite mixture is given by

σ 2(y ) =K

sum

k=1

wk(σ2(x(k)) + σ 2(ε(k)) + (μ(x(k)) + μ(ε(k)))

2)

minus( K

sum

k=1

wk(μ(x(k)) + μ(ε(k)))

)2

(8)

Given estimates of μ(x(k)) μ(ε(k)) σ 2(x(k)) and σ 2(ε(k)) respectively denoted byx(k) ε(k) s2

x(k)and s2

ε(k) we can hence obtain an estimate s2

y of σ 2(y ) for a given setof weights wk In order to have statistical ensemble members which have on averagethe correct variance we would like to find weights wk for which s2

y = s2y the observed

variance of the observations on the same period

s2y =

Ksum

k=1

wk(s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2) minus

( Ksum

k=1

wk(x(k) + ε(k))

)2

(9)

There can obviously be multiple solutions to this equation or no solution for thatmatter We propose to constrain the problem further by taking into account the observedshape of pk the probability that the best member is the kth order statistic as a functionof k For underdispersive dynamical ensembles the function pk is U-shaped whereasit is bell-shaped for overdispersive ensembles A simple parametric function which canbe used to approximate pk is the beta probability density function We use here theparametrization proposed by Walley (1996)

fB(x ω τ ) = xωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1

B(ω middot τ ω minus ω middot τ ) ω gt 0 0 τ 1 (10)

where B(α β) = int 10 xαminus1(1 minus x)βminus1 dx is the beta function (Weisstein 2002) While

being defined by only two parameters it can be either symmetric (when τ = 05)

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1351

GFS reforecast experiment (Hamill et al 2006) Results are discussed in section 6 anda brief conclusion follows

2 THE BEST-MEMBER METHOD

A numerical weather prediction system provides forecasts for different weatherelements on a grid at different lead times using some information at0 on the state of theatmosphere at the initial time t0 and a dynamical model of the atmosphere xt = g(at0)To assess the impact of uncertainty on the analysis at0 and on the dynamical modelstructure g(middot) it is possible to run different models with slightly different initial con-ditions Let yt = ytj j = 1 2 J be the vector of unknowns that the systemis forecasting at time t ie ytj corresponds to one weather element at one locationat a given lead time t (thus index j denotes one weather element at one location)Let xtk = xtkj be a forecast of y provided by the kth member of the EPS andXt = xtk k = 1 2 K denote the set of all ensemble members Our objective isto obtain a probabilistic prediction of yt given Xt ie the predictive distribution p(yt |Xt ) In practice we are in fact essentially interested by predictive simulations fromp(yt | Xt ) ie scenarios ym m = 1 2 M sampled from p(yt | Xt ) Further-more we would often like to be able to obtain many more scenarios than there areensemble members ie M K as we would like to be able to estimate the probabilityassociated with extreme events

RS03 have recently introduced the lsquobest-memberrsquo method to obtain scenarios whichappear to be sampled from p(yt | Xt ) The idea is to lsquodressrsquo each ensemble memberxtk with a probability distribution representing the error made by this member when ithappens to be the best member of the ensemble Amongst the ensemble members xtkdefine the best member xlowast

t as the one which minimizes yt minus xtk for some given norm middot

xlowastt = arg min

xtk

yt minus xtk (1)

For a given norm and a database of past forecasts build a probability distributionfrom realizations of εlowast

t = yt minus xlowastt Finally define a probabilistic forecast which consists

of a finite mixture of error distributions centred on each original member of theensemble

p(yt | Xt ) asymp 1

K

Ksum

k=1

pεlowast(yt minus xtk) (2)

Simulating from this distribution is simple to obtain M = N middot K simulations dresseach of the K members of the ensemble by sampling N error patterns εtkn n =1 2 N from a database of past forecasts and add each error pattern to thisensemble member

ytkn = xtk + εtkn (3)

We will refer to Xt = xtk as a dynamical ensemble since it is obtained froma dynamical model of the atmosphere and to Yt = ytkn as a statistical ensembleWB05 have shown that the best-member method does not lead to reliable forecastsReliability is generally a desirable feature for a probabilistic forecasting system If theforecasted system is stationary a probabilistic forecasting system is said to be reliableif we cannot tell apart a large set of observed values from a large set of forecasts Thisis obviously the case for a perfect forecast but also if the forecasts are simply issued

1352 V FORTIN et al

by drawing randomly from the stationary distribution of the observed process whichwould correspond to climatology for weather elements

Conditions under which the best-member method leads to reliable forecasts are notknown On the other hand one can think of many examples where it fails Consider forexample an EPS which provides forecasts having the right amount of variance but noskill Unless the number of members is so large that the best-member error is alwayszero the best-member error distribution will have a variance larger than zero As thebest-member method adds uncorrelated noise to each member it increases the varianceHence the statistical ensemble will necessarily have too much variance if the dynamicalensemble already had enough

(a) Rescaling the error patterns to obtain the right amount of varianceWB05 suggest that the error patterns be rescaled so that the covariance of any two

randomly picked dressed ensemble members is the same as the covariance of the obser-vations with any one of those two members This technique has two disadvantages firstit means that error patterns get distorted so that one is adding to the ensemble memberserror patterns that did not actually occur and which may be unfeasible Secondly thetechnique is only applicable when the dynamical ensemble is underdispersive Indeedas we still add uncorrelated noise to the dynamical ensemble we can only increase thevariance

In the conclusion of their paper WB05 mention that a possible solution to thesecond problem would be to weight each ensemble member differently We showin this paper that this can solve both problems The rationale for treating ensemblemembers differently even when all members of the ensemble are generated identicallyand independently is as follows when we dress an ensemble member we already knowthe outcome for each member We can thus use information from the other members inparticular where it lies in the ensemble measured for example by the distance fromthe ensemble mean using the norm already defined to identify the best member ofthe ensemble Consider an EPS which produces underdispersive dynamical ensemblesBecause they are underdispersive the outcome will often lie outside the ensembleconvex hull Hence a member which is on the outside of this hull will have more chanceof being the best member of the ensemble and thus can be given more weight A similarargument can be made for an EPS which produces overdispersive dynamical ensemblessmaller weights should be given to ensemble members which are further away from theensemble mean

3 A SYNTHETIC EPS SYSTEM

We shall reuse in this paper the simple synthetic EPS set-up used by WB05 toillustrate the improvements they have proposed to the best-member method Assumethat observations yt t = 1 2 T are independent normally distributed randomvariables with zero mean but time-dependent variance σ 2

t (y) Assume also that σ 2t (y)

is drawn from a chi-squared distribution with three degrees of freedom An EPSprovides a K-member forecast Xt = xtk k = 1 2 K for each observation yt All ensemble members are independent identically distributed (iid) normal variateshaving zero mean and time-dependent variance σ 2

t (x) where σ 2t (x) is related to σ 2

t (y)

by a random relationship σ 2t (x) = at middot σ 2

t (y) at being a uniform random variable onthe interval [μa minus 01 μa + 01] where μa is the expectation of at

Figure 1 illustrates the set-up using a directed acyclic graph (DAG) or Bayesiannetwork (Jensen 2001) In an acyclic graph circles represent unknown parameters and

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1353

a2(3)

t2(y) t

2(x)

Xtyt

a2(3)

t2(y) t

2(x)

Xtyt

(a) (b)

Figure 1 Directed acyclic graph representing the synthetic simulation set-up used by WB05 for (a) calibrationand (b) prediction

squares represent known parameters and observations Arrows represent dependencelinks Figure 1(a) presents the DAG for the calibration period during which the obser-vations yt are known (and thus represented by a square) whereas Fig 1(b) presents theDAG for the validation period during which the observations yt are unknown (and thusrepresented by a circle)

The EPS thus displays a lsquospread-skillrsquo relationship when the observation yt is lessvariable thus more predictable ie σ 2

t (y) happens to be small the observed varianceof the ensemble members will tend to be smaller and conversely The EPS will beunderdispersive if μa lt 1 and overdispersive if μa gt 1 Because the variable of interestis univariate and the forecasts are unbiased the best member xlowast

t of each ensemble issimply identified by minimizing the absolute difference between the observations andthe forecasts

(a) When and why the best-member method failsTo test the best-member method WB05 have used values of K from 1 to 20 and

values of μa from 01 to 09 As we want to illustrate a method which gives a differentweight to each ensemble member we will focus on EPS having between K = 3 andK = 20 members We will however consider values of μa from 01 to 19 as we will beable to deal with overdispersive EPS As proposed by WB05 each time we will obtain adatabase of past forecasts by generating T = 15 000 observations yt and correspondingensemble forecasts Xt The different methods will then be compared using statisticscomputed on statistical ensembles Yt = ytkn obtained by dressing a second set ofT = 15 000 ensemble forecasts Xt To obtain accurate statistics N = 150 statisticalensemble members will be drawn from each ensemble member xtk

The observations yt are drawn from an infinite mixture of normal distributionswith zero mean but different variances the variances being drawn from a chi-squaredwith ν = 3 degrees of freedom and thus having an expectation of E[σ 2

t (y)] = ν = 3It follows that the stationary distribution of the process yt has zero mean a varianceσ 2(y) of 3 (equal to the expectation of the chi-squared distribution) and a skewnessof zero Kurtosis as estimated by recursive adaptive Simpson quadrature using theMATLAB function quad (Gander and Gautschi 2000) is very close to β2 = 5 Recallthat kurtosis measures the degree of peakedness of a distribution that it is defined bythe ratio of the fourth centred moment to the square of the variance β2 = μ4σ

4 and

1354 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

05

05 0

60

6

06

07

07

07

08

08

08

09

09

09

11

11

11

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

17

17

17

18

18

18

19

19

19

2

22

12

2

23

1

1

1a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

12

2

2

20

00

0

0

0

00

a

K

(b) 2 ens - 2 obs

Figure 2 (a) Ratio of the variance of the statistical ensemble members generated by the original best-membermethod to the variance of the observations and (b) difference between the kurtosis of these ensemble membersand the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and of the size

of the dynamical ensemble (K)

that the kurtosis of the normal distribution is β2 = 3 (Weisstein 2002) Kurtosis excesswith respect to the normal distribution is defined by γ2 = β2 minus 3 A distribution witha high peak and heavier tails than the normal distribution is said to be leptokurticand has a positive kurtosis excess ie γ2 gt 0 A flat-topped curve with thin tails issaid to be platykurtic and has a negative kurtosis excess ie γ2 lt 0 The stationarydistribution of the process yt with a kurtosis excess of γ2 = 2 is leptokurtic and thushas heavier tails than a normal distribution The stationary distribution of the dynamicalensemble members Xt = xtk and best-member error εlowast

t also has zero mean and zeroskewness and therefore the stationary distribution of the statistical ensemble membersYt = ytkn will also have zero mean and zero skewness However their variance andkurtosis vary with K the number of members in the EPS and μa the expected amountof underdispersion or overdispersion of the undressed members

Figure 2(a) shows the ratio of the variance of the statistical ensemble membersgenerated by the best-member method to the variance of the observations as a functionof μa and K while Fig 2(b) shows the difference between the kurtosis of the ensemblemembers and the kurtosis of the observations as a function of μa and K The thick lineon both graphs shows the combinations of μa and K for which the statistical ensemblemembers have the correct variance and kurtosis

It can be seen in Fig 2(a) that the variance of the statistical ensemble increaseswith the variance of the dynamical ensemble and decreases with the ensemble sizeThese two characteristics are not specific to this experiment but rather much moregeneral Indeed the variance of the statistical ensemble can only be larger than that ofthe dynamical ensemble and as the number of ensemble members increase the distancebetween the best ensemble member and the observation will generally decrease towards

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1355

0 05 1 15 2-02

0

02

04

06

08

1

12

14

16

a

2 en

s -

2 ob

s

Figure 3 Difference between the kurtosis of the dynamical ensemble members and the kurtosis of the observa-tions as a function of the expected amount of underdispersion (μa )

zero so that the variance of the statistical ensemble will tend towards the variance of thedynamical ensemble for large ensemble sizes Hence the best-member method cannotprovide reliable forecasts in a general setting

It is more difficult to explain the behaviour of the kurtosis as a function of μa andK expressed by Fig 2(b) In this particular case the kurtosis of the statistical ensembledecreases with μa for μa lt 05 whereas for μa gt 05 it increases with K This peculiarbehaviour is caused by the fact that the dynamical ensemble members are not drawnfrom the same distribution as the observations and in particular do not have the samekurtosis Figure 3 shows the kurtosis of the EPS outputs as a function of μa It can beseen that for μa lt 05 the kurtosis of the dynamical ensemble members is already muchlarger than the kurtosis of the observations Hence the best-member method starts withensemble members which already have too heavy tails Unfortunately not only does thebest-member method fail to provide ensemble members with the correct variance butit generally increases the kurtosis which will lead to a systematic overestimation of theprobability of extreme events

(b) Rescaling the variance leads to the overestimation of the probability ofextreme events

WB05 having shown that the best-member method does not produce ensembleswith the correct variance propose to dress each dynamical ensemble member with anindependent error distribution having still a zero mean but a covariance chosen such thatthe covariance between any pair of ensemble members be the same as the covariancebetween one ensemble member and the observations For scalar observations theypropose the following estimator for the variance of the error distribution

s2 = mse(y x) minus (1 + 1K) middot s2x (4)

where mse(y x) = T minus1 sumTt=1(yt minus xt )

2 is the mean square error between the observa-tions yt and the ensemble mean xt = Kminus1 sumK

k=1 xkt computed from a database ofT error forecasts and observations If the dynamical ensemble members were alreadyoverdispersive then s2 will be negative so that the method of WB05 is not applicable

1356 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

11

1

2

2

2

3

3

3 3

4

4

4

4

5

5

5

6

0

00

0

a

K

(b) 2 ens - 2 obs

Figure 4 (a) Ratio of the variance of the statistical ensemble members generated by the modified best-membermethod of WB05 to the variance of the observations and (b) difference between the kurtosis of these ensemblemembers and the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and

of the size of the dynamical ensemble (K)

To dress individual ensemble members with an error distribution having a varianceequal to s2 one can simply multiply best-member errors εlowast

t drawn from an archive

of past forecasts by a factor ω =radic

s2s2εlowast where s2

εlowast = (T minus 1)minus1 sumTt=1(ε

lowastt )

2 is theestimated variance of the best-member error The statistical ensemble members willthen be given by

ytkn = xtk + ω middot εtkn (5)

where the innovations εtkn are drawn at random from a database of past error fore-casts The parameter ω will be larger than one if the best-member method producedunderdispersive ensemble members and smaller than one otherwise

Figure 4(a) shows that the variance produced by the modified best-member methodof WB05 is very close to the variance of the observations departures being likelycaused by sampling uncertainty and roundoff errors Figure 4(b) shows however thatthis improvement in the variance of the dressed ensemble members comes at the costof an increased kurtosis Hence the method proposed by WB05 leads to predictivedistributions which have much heavier tails than the observations which will lead toan overestimation of the probability of extreme events

4 DRESSING AND WEIGHTING EACH MEMBER DIFFERENTLY THE WEIGHTED MEMBERSMETHOD

In the best-member method each ensemble member is dressed using the sameerror distribution This seems to make sense if all ensemble members are exchangeableprior to their observation as there is no a priori reason for assuming that any ensemblemember has any more chance of being the best member of the ensemble or that its

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1357

error distribution should be any different from that of the other members if it indeedhappens to be the best member Recall that a set of random variables u = u1 uKare exchangeable if their joint probability density function p(u) does not depend on theirorder ie if for any permutation uprime of the elements of u we have p(u) = p(uprime) (Lindleyand Novick 1981) In other words the forecasts can be reordered and renumberedwithout losing any information on the predictand However we must not forget that thevalues taken by all ensemble members are known at the time an ensemble member getsdressed with an error distribution Hence member dressing is performed a posterioriwhen the ensemble members are not exchangeable any more

Consider for example an EPS which produces underdispersive dynamical ensem-bles Because they are underdispersive the outcome will often lie outside the spread ofthe ensemble Hence the extrema of the ensemble will have much more chance of beingthe best member of the ensemble than a member which is close to the ensemble meanConversely if the EPS is highly overdispersive members close to the ensemble meanwill have more chance of being the best members of the ensemble than the extrema ofthe ensemble Furthermore if an ensemble member close to a mode of the ensembleempirical distribution happens to be the best member of the ensemble the error madewill tend to be small as it must be smaller than half the distance to the closest ensemblemember otherwise it would not be the best member of the ensemble

Hence the probability that an ensemble member be the best member of theensemble as well as the error distribution of the best member of the ensemble can bothdepend on the location of the ensemble member within the ensemble For multivariateforecasts this location can be measured for example by the distance to the ensemblemean using the norm selected to identify the best member of the ensemble Forunivariate forecasts which are the focus of this paper a simpler solution exists wecan simply sort the ensemble members and take into account the rank of a member inthe sorted ensemble when dressing it with an error distribution

(a) Dressing each ensemble member with a different kernelGoing back to the simulations performed with the synthetic EPS system we can

sort the dynamical ensemble forecasts and estimate for ranked ensemble members theprobability of being the best member of the ensemble as well as the mean and varianceof the error distribution Define xt(k) to be the kth order statistic of an ensemble Xt =xtk k = 1 2 K and εlowast

(k)= yt minus xlowast

t | xlowastt = xt(k) t = 1 2 T to be the

best-member errors observed in the database of past forecasts when the best memberwas the kth order statistic Define also pk to be the probability that the best member isxt(k) ie pk = Pr[xlowast

t = xt(k)]Figure 5 shows pk as a function of k for K = 20 ensemble members for μa = 03

(an underdispersive ensemble) and for μa = 17 (an overdispersive ensemble) It canbe seen that pk depends on k as well as on μa as predicted the lowest and highestensemble members have higher (lower) probability of being the best member of anunderdispersive (overdispersive) ensemble Figure 6 presents the mean and standarddeviation of εlowast

(k) again for μa = 03 and μa = 17 It can be seen that the error

distribution is biased when the best member is one of the lowest or one of the highestmembers of the ensemble and that the standard deviation of the error is larger in thesecases Hence it does not make a lot of sense to dress each ensemble member with thesame error distribution

When dressing the kth member in the ordered ensemble we therefore propose thatinstead of resampling from the archive of all best-member errors we instead resample

1358 V FORTIN et al

0 5 10 15 200

005

01

015

02P

roba

bilit

y

k0 5 10 15 20

0

002

004

006

008

01

k

Pro

bab

ility

(a ) a=03 (b) a=17

Figure 5 Probability that the member of rank k is the best member of the ordered ensemble for (a) μa = 03and (b) μa = 17

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

Mean Standard deviation

(a) a=03 (b) a=17

Figure 6 Mean and standard deviation of the best-member error distribution as a function of the rank k of themember in the ordered ensemble for (a) μa = 03 and (b) μa = 17

from εlowast(k)

to obtain dressed ensemble members

ytkn = xt(k) + ω middot εt(k)n (6)

where εt(k)n is drawn at random from εlowast(k)

(b) Giving a different weight to each dynamical ensemble memberThe number of dressed members generated from the kth order statistic should also

reflect the probability that this particular member is the best member of the ensemble Ifwe want to obtain M ensemble members from the original K-member ensemble thena possibility would be to draw Nk = pk middot M dressed ensemble members from xt(k)However as we are still not drawing the statistical members Yt from the conditionaldistribution p(yt | Xt ) we would not be guaranteed to obtain statistical members whichdisplay the desired amount of variance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1359

An alternative is to optimize the number of dressed ensemble members drawn fromeach dynamical member so as to get the correct variance Let wk be the proportion ofthe statistical members that are generated from the kth order statistic of each dynamicalensemble If we build an archive by mixing statistical ensemble members from differentforecasts each statistical ensemble member y will have been obtained as the sum ofthe kth order statistic of a dynamical ensemble xt(k) and an independent error termεt(k)n resampled from εlowast

(k) Let μ(x(k)) and σ 2(x(k)) be respectively the mean and

variance of xt(k) and μ(ε(k)) and σ 2(ε(k)) be the mean and variance of εt(k)n Denoteby y(k) a statistical ensemble member which has been drawn as the sum of a kth orderstatistic and an error term It follows that the mean and variance of y(k) are respectivelygiven by μ(y(k)) = μ(x(k)) + μ(ε(k)) and σ 2(y(k)) = σ 2(x(k)) + σ 2(ε(k)) If the numberof forecasts in the archive is large any given ensemble member y can then be consideredto have been drawn from a mixture of K distributions p(y(k)) where the probability ofeach distribution corresponds to the proportion of members in the archive which havebeen drawn from the kth order statistic of a dynamical ensemble wk

p(y ) =K

sum

k=1

wk middot p(y(k)) (7)

It can be shown that the variance σ 2(y ) of this finite mixture is given by

σ 2(y ) =K

sum

k=1

wk(σ2(x(k)) + σ 2(ε(k)) + (μ(x(k)) + μ(ε(k)))

2)

minus( K

sum

k=1

wk(μ(x(k)) + μ(ε(k)))

)2

(8)

Given estimates of μ(x(k)) μ(ε(k)) σ 2(x(k)) and σ 2(ε(k)) respectively denoted byx(k) ε(k) s2

x(k)and s2

ε(k) we can hence obtain an estimate s2

y of σ 2(y ) for a given setof weights wk In order to have statistical ensemble members which have on averagethe correct variance we would like to find weights wk for which s2

y = s2y the observed

variance of the observations on the same period

s2y =

Ksum

k=1

wk(s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2) minus

( Ksum

k=1

wk(x(k) + ε(k))

)2

(9)

There can obviously be multiple solutions to this equation or no solution for thatmatter We propose to constrain the problem further by taking into account the observedshape of pk the probability that the best member is the kth order statistic as a functionof k For underdispersive dynamical ensembles the function pk is U-shaped whereasit is bell-shaped for overdispersive ensembles A simple parametric function which canbe used to approximate pk is the beta probability density function We use here theparametrization proposed by Walley (1996)

fB(x ω τ ) = xωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1

B(ω middot τ ω minus ω middot τ ) ω gt 0 0 τ 1 (10)

where B(α β) = int 10 xαminus1(1 minus x)βminus1 dx is the beta function (Weisstein 2002) While

being defined by only two parameters it can be either symmetric (when τ = 05)

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

1352 V FORTIN et al

by drawing randomly from the stationary distribution of the observed process whichwould correspond to climatology for weather elements

Conditions under which the best-member method leads to reliable forecasts are notknown On the other hand one can think of many examples where it fails Consider forexample an EPS which provides forecasts having the right amount of variance but noskill Unless the number of members is so large that the best-member error is alwayszero the best-member error distribution will have a variance larger than zero As thebest-member method adds uncorrelated noise to each member it increases the varianceHence the statistical ensemble will necessarily have too much variance if the dynamicalensemble already had enough

(a) Rescaling the error patterns to obtain the right amount of varianceWB05 suggest that the error patterns be rescaled so that the covariance of any two

randomly picked dressed ensemble members is the same as the covariance of the obser-vations with any one of those two members This technique has two disadvantages firstit means that error patterns get distorted so that one is adding to the ensemble memberserror patterns that did not actually occur and which may be unfeasible Secondly thetechnique is only applicable when the dynamical ensemble is underdispersive Indeedas we still add uncorrelated noise to the dynamical ensemble we can only increase thevariance

In the conclusion of their paper WB05 mention that a possible solution to thesecond problem would be to weight each ensemble member differently We showin this paper that this can solve both problems The rationale for treating ensemblemembers differently even when all members of the ensemble are generated identicallyand independently is as follows when we dress an ensemble member we already knowthe outcome for each member We can thus use information from the other members inparticular where it lies in the ensemble measured for example by the distance fromthe ensemble mean using the norm already defined to identify the best member ofthe ensemble Consider an EPS which produces underdispersive dynamical ensemblesBecause they are underdispersive the outcome will often lie outside the ensembleconvex hull Hence a member which is on the outside of this hull will have more chanceof being the best member of the ensemble and thus can be given more weight A similarargument can be made for an EPS which produces overdispersive dynamical ensemblessmaller weights should be given to ensemble members which are further away from theensemble mean

3 A SYNTHETIC EPS SYSTEM

We shall reuse in this paper the simple synthetic EPS set-up used by WB05 toillustrate the improvements they have proposed to the best-member method Assumethat observations yt t = 1 2 T are independent normally distributed randomvariables with zero mean but time-dependent variance σ 2

t (y) Assume also that σ 2t (y)

is drawn from a chi-squared distribution with three degrees of freedom An EPSprovides a K-member forecast Xt = xtk k = 1 2 K for each observation yt All ensemble members are independent identically distributed (iid) normal variateshaving zero mean and time-dependent variance σ 2

t (x) where σ 2t (x) is related to σ 2

t (y)

by a random relationship σ 2t (x) = at middot σ 2

t (y) at being a uniform random variable onthe interval [μa minus 01 μa + 01] where μa is the expectation of at

Figure 1 illustrates the set-up using a directed acyclic graph (DAG) or Bayesiannetwork (Jensen 2001) In an acyclic graph circles represent unknown parameters and

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1353

a2(3)

t2(y) t

2(x)

Xtyt

a2(3)

t2(y) t

2(x)

Xtyt

(a) (b)

Figure 1 Directed acyclic graph representing the synthetic simulation set-up used by WB05 for (a) calibrationand (b) prediction

squares represent known parameters and observations Arrows represent dependencelinks Figure 1(a) presents the DAG for the calibration period during which the obser-vations yt are known (and thus represented by a square) whereas Fig 1(b) presents theDAG for the validation period during which the observations yt are unknown (and thusrepresented by a circle)

The EPS thus displays a lsquospread-skillrsquo relationship when the observation yt is lessvariable thus more predictable ie σ 2

t (y) happens to be small the observed varianceof the ensemble members will tend to be smaller and conversely The EPS will beunderdispersive if μa lt 1 and overdispersive if μa gt 1 Because the variable of interestis univariate and the forecasts are unbiased the best member xlowast

t of each ensemble issimply identified by minimizing the absolute difference between the observations andthe forecasts

(a) When and why the best-member method failsTo test the best-member method WB05 have used values of K from 1 to 20 and

values of μa from 01 to 09 As we want to illustrate a method which gives a differentweight to each ensemble member we will focus on EPS having between K = 3 andK = 20 members We will however consider values of μa from 01 to 19 as we will beable to deal with overdispersive EPS As proposed by WB05 each time we will obtain adatabase of past forecasts by generating T = 15 000 observations yt and correspondingensemble forecasts Xt The different methods will then be compared using statisticscomputed on statistical ensembles Yt = ytkn obtained by dressing a second set ofT = 15 000 ensemble forecasts Xt To obtain accurate statistics N = 150 statisticalensemble members will be drawn from each ensemble member xtk

The observations yt are drawn from an infinite mixture of normal distributionswith zero mean but different variances the variances being drawn from a chi-squaredwith ν = 3 degrees of freedom and thus having an expectation of E[σ 2

t (y)] = ν = 3It follows that the stationary distribution of the process yt has zero mean a varianceσ 2(y) of 3 (equal to the expectation of the chi-squared distribution) and a skewnessof zero Kurtosis as estimated by recursive adaptive Simpson quadrature using theMATLAB function quad (Gander and Gautschi 2000) is very close to β2 = 5 Recallthat kurtosis measures the degree of peakedness of a distribution that it is defined bythe ratio of the fourth centred moment to the square of the variance β2 = μ4σ

4 and

1354 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

05

05 0

60

6

06

07

07

07

08

08

08

09

09

09

11

11

11

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

17

17

17

18

18

18

19

19

19

2

22

12

2

23

1

1

1a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

12

2

2

20

00

0

0

0

00

a

K

(b) 2 ens - 2 obs

Figure 2 (a) Ratio of the variance of the statistical ensemble members generated by the original best-membermethod to the variance of the observations and (b) difference between the kurtosis of these ensemble membersand the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and of the size

of the dynamical ensemble (K)

that the kurtosis of the normal distribution is β2 = 3 (Weisstein 2002) Kurtosis excesswith respect to the normal distribution is defined by γ2 = β2 minus 3 A distribution witha high peak and heavier tails than the normal distribution is said to be leptokurticand has a positive kurtosis excess ie γ2 gt 0 A flat-topped curve with thin tails issaid to be platykurtic and has a negative kurtosis excess ie γ2 lt 0 The stationarydistribution of the process yt with a kurtosis excess of γ2 = 2 is leptokurtic and thushas heavier tails than a normal distribution The stationary distribution of the dynamicalensemble members Xt = xtk and best-member error εlowast

t also has zero mean and zeroskewness and therefore the stationary distribution of the statistical ensemble membersYt = ytkn will also have zero mean and zero skewness However their variance andkurtosis vary with K the number of members in the EPS and μa the expected amountof underdispersion or overdispersion of the undressed members

Figure 2(a) shows the ratio of the variance of the statistical ensemble membersgenerated by the best-member method to the variance of the observations as a functionof μa and K while Fig 2(b) shows the difference between the kurtosis of the ensemblemembers and the kurtosis of the observations as a function of μa and K The thick lineon both graphs shows the combinations of μa and K for which the statistical ensemblemembers have the correct variance and kurtosis

It can be seen in Fig 2(a) that the variance of the statistical ensemble increaseswith the variance of the dynamical ensemble and decreases with the ensemble sizeThese two characteristics are not specific to this experiment but rather much moregeneral Indeed the variance of the statistical ensemble can only be larger than that ofthe dynamical ensemble and as the number of ensemble members increase the distancebetween the best ensemble member and the observation will generally decrease towards

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1355

0 05 1 15 2-02

0

02

04

06

08

1

12

14

16

a

2 en

s -

2 ob

s

Figure 3 Difference between the kurtosis of the dynamical ensemble members and the kurtosis of the observa-tions as a function of the expected amount of underdispersion (μa )

zero so that the variance of the statistical ensemble will tend towards the variance of thedynamical ensemble for large ensemble sizes Hence the best-member method cannotprovide reliable forecasts in a general setting

It is more difficult to explain the behaviour of the kurtosis as a function of μa andK expressed by Fig 2(b) In this particular case the kurtosis of the statistical ensembledecreases with μa for μa lt 05 whereas for μa gt 05 it increases with K This peculiarbehaviour is caused by the fact that the dynamical ensemble members are not drawnfrom the same distribution as the observations and in particular do not have the samekurtosis Figure 3 shows the kurtosis of the EPS outputs as a function of μa It can beseen that for μa lt 05 the kurtosis of the dynamical ensemble members is already muchlarger than the kurtosis of the observations Hence the best-member method starts withensemble members which already have too heavy tails Unfortunately not only does thebest-member method fail to provide ensemble members with the correct variance butit generally increases the kurtosis which will lead to a systematic overestimation of theprobability of extreme events

(b) Rescaling the variance leads to the overestimation of the probability ofextreme events

WB05 having shown that the best-member method does not produce ensembleswith the correct variance propose to dress each dynamical ensemble member with anindependent error distribution having still a zero mean but a covariance chosen such thatthe covariance between any pair of ensemble members be the same as the covariancebetween one ensemble member and the observations For scalar observations theypropose the following estimator for the variance of the error distribution

s2 = mse(y x) minus (1 + 1K) middot s2x (4)

where mse(y x) = T minus1 sumTt=1(yt minus xt )

2 is the mean square error between the observa-tions yt and the ensemble mean xt = Kminus1 sumK

k=1 xkt computed from a database ofT error forecasts and observations If the dynamical ensemble members were alreadyoverdispersive then s2 will be negative so that the method of WB05 is not applicable

1356 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

11

1

2

2

2

3

3

3 3

4

4

4

4

5

5

5

6

0

00

0

a

K

(b) 2 ens - 2 obs

Figure 4 (a) Ratio of the variance of the statistical ensemble members generated by the modified best-membermethod of WB05 to the variance of the observations and (b) difference between the kurtosis of these ensemblemembers and the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and

of the size of the dynamical ensemble (K)

To dress individual ensemble members with an error distribution having a varianceequal to s2 one can simply multiply best-member errors εlowast

t drawn from an archive

of past forecasts by a factor ω =radic

s2s2εlowast where s2

εlowast = (T minus 1)minus1 sumTt=1(ε

lowastt )

2 is theestimated variance of the best-member error The statistical ensemble members willthen be given by

ytkn = xtk + ω middot εtkn (5)

where the innovations εtkn are drawn at random from a database of past error fore-casts The parameter ω will be larger than one if the best-member method producedunderdispersive ensemble members and smaller than one otherwise

Figure 4(a) shows that the variance produced by the modified best-member methodof WB05 is very close to the variance of the observations departures being likelycaused by sampling uncertainty and roundoff errors Figure 4(b) shows however thatthis improvement in the variance of the dressed ensemble members comes at the costof an increased kurtosis Hence the method proposed by WB05 leads to predictivedistributions which have much heavier tails than the observations which will lead toan overestimation of the probability of extreme events

4 DRESSING AND WEIGHTING EACH MEMBER DIFFERENTLY THE WEIGHTED MEMBERSMETHOD

In the best-member method each ensemble member is dressed using the sameerror distribution This seems to make sense if all ensemble members are exchangeableprior to their observation as there is no a priori reason for assuming that any ensemblemember has any more chance of being the best member of the ensemble or that its

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1357

error distribution should be any different from that of the other members if it indeedhappens to be the best member Recall that a set of random variables u = u1 uKare exchangeable if their joint probability density function p(u) does not depend on theirorder ie if for any permutation uprime of the elements of u we have p(u) = p(uprime) (Lindleyand Novick 1981) In other words the forecasts can be reordered and renumberedwithout losing any information on the predictand However we must not forget that thevalues taken by all ensemble members are known at the time an ensemble member getsdressed with an error distribution Hence member dressing is performed a posterioriwhen the ensemble members are not exchangeable any more

Consider for example an EPS which produces underdispersive dynamical ensem-bles Because they are underdispersive the outcome will often lie outside the spread ofthe ensemble Hence the extrema of the ensemble will have much more chance of beingthe best member of the ensemble than a member which is close to the ensemble meanConversely if the EPS is highly overdispersive members close to the ensemble meanwill have more chance of being the best members of the ensemble than the extrema ofthe ensemble Furthermore if an ensemble member close to a mode of the ensembleempirical distribution happens to be the best member of the ensemble the error madewill tend to be small as it must be smaller than half the distance to the closest ensemblemember otherwise it would not be the best member of the ensemble

Hence the probability that an ensemble member be the best member of theensemble as well as the error distribution of the best member of the ensemble can bothdepend on the location of the ensemble member within the ensemble For multivariateforecasts this location can be measured for example by the distance to the ensemblemean using the norm selected to identify the best member of the ensemble Forunivariate forecasts which are the focus of this paper a simpler solution exists wecan simply sort the ensemble members and take into account the rank of a member inthe sorted ensemble when dressing it with an error distribution

(a) Dressing each ensemble member with a different kernelGoing back to the simulations performed with the synthetic EPS system we can

sort the dynamical ensemble forecasts and estimate for ranked ensemble members theprobability of being the best member of the ensemble as well as the mean and varianceof the error distribution Define xt(k) to be the kth order statistic of an ensemble Xt =xtk k = 1 2 K and εlowast

(k)= yt minus xlowast

t | xlowastt = xt(k) t = 1 2 T to be the

best-member errors observed in the database of past forecasts when the best memberwas the kth order statistic Define also pk to be the probability that the best member isxt(k) ie pk = Pr[xlowast

t = xt(k)]Figure 5 shows pk as a function of k for K = 20 ensemble members for μa = 03

(an underdispersive ensemble) and for μa = 17 (an overdispersive ensemble) It canbe seen that pk depends on k as well as on μa as predicted the lowest and highestensemble members have higher (lower) probability of being the best member of anunderdispersive (overdispersive) ensemble Figure 6 presents the mean and standarddeviation of εlowast

(k) again for μa = 03 and μa = 17 It can be seen that the error

distribution is biased when the best member is one of the lowest or one of the highestmembers of the ensemble and that the standard deviation of the error is larger in thesecases Hence it does not make a lot of sense to dress each ensemble member with thesame error distribution

When dressing the kth member in the ordered ensemble we therefore propose thatinstead of resampling from the archive of all best-member errors we instead resample

1358 V FORTIN et al

0 5 10 15 200

005

01

015

02P

roba

bilit

y

k0 5 10 15 20

0

002

004

006

008

01

k

Pro

bab

ility

(a ) a=03 (b) a=17

Figure 5 Probability that the member of rank k is the best member of the ordered ensemble for (a) μa = 03and (b) μa = 17

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

Mean Standard deviation

(a) a=03 (b) a=17

Figure 6 Mean and standard deviation of the best-member error distribution as a function of the rank k of themember in the ordered ensemble for (a) μa = 03 and (b) μa = 17

from εlowast(k)

to obtain dressed ensemble members

ytkn = xt(k) + ω middot εt(k)n (6)

where εt(k)n is drawn at random from εlowast(k)

(b) Giving a different weight to each dynamical ensemble memberThe number of dressed members generated from the kth order statistic should also

reflect the probability that this particular member is the best member of the ensemble Ifwe want to obtain M ensemble members from the original K-member ensemble thena possibility would be to draw Nk = pk middot M dressed ensemble members from xt(k)However as we are still not drawing the statistical members Yt from the conditionaldistribution p(yt | Xt ) we would not be guaranteed to obtain statistical members whichdisplay the desired amount of variance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1359

An alternative is to optimize the number of dressed ensemble members drawn fromeach dynamical member so as to get the correct variance Let wk be the proportion ofthe statistical members that are generated from the kth order statistic of each dynamicalensemble If we build an archive by mixing statistical ensemble members from differentforecasts each statistical ensemble member y will have been obtained as the sum ofthe kth order statistic of a dynamical ensemble xt(k) and an independent error termεt(k)n resampled from εlowast

(k) Let μ(x(k)) and σ 2(x(k)) be respectively the mean and

variance of xt(k) and μ(ε(k)) and σ 2(ε(k)) be the mean and variance of εt(k)n Denoteby y(k) a statistical ensemble member which has been drawn as the sum of a kth orderstatistic and an error term It follows that the mean and variance of y(k) are respectivelygiven by μ(y(k)) = μ(x(k)) + μ(ε(k)) and σ 2(y(k)) = σ 2(x(k)) + σ 2(ε(k)) If the numberof forecasts in the archive is large any given ensemble member y can then be consideredto have been drawn from a mixture of K distributions p(y(k)) where the probability ofeach distribution corresponds to the proportion of members in the archive which havebeen drawn from the kth order statistic of a dynamical ensemble wk

p(y ) =K

sum

k=1

wk middot p(y(k)) (7)

It can be shown that the variance σ 2(y ) of this finite mixture is given by

σ 2(y ) =K

sum

k=1

wk(σ2(x(k)) + σ 2(ε(k)) + (μ(x(k)) + μ(ε(k)))

2)

minus( K

sum

k=1

wk(μ(x(k)) + μ(ε(k)))

)2

(8)

Given estimates of μ(x(k)) μ(ε(k)) σ 2(x(k)) and σ 2(ε(k)) respectively denoted byx(k) ε(k) s2

x(k)and s2

ε(k) we can hence obtain an estimate s2

y of σ 2(y ) for a given setof weights wk In order to have statistical ensemble members which have on averagethe correct variance we would like to find weights wk for which s2

y = s2y the observed

variance of the observations on the same period

s2y =

Ksum

k=1

wk(s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2) minus

( Ksum

k=1

wk(x(k) + ε(k))

)2

(9)

There can obviously be multiple solutions to this equation or no solution for thatmatter We propose to constrain the problem further by taking into account the observedshape of pk the probability that the best member is the kth order statistic as a functionof k For underdispersive dynamical ensembles the function pk is U-shaped whereasit is bell-shaped for overdispersive ensembles A simple parametric function which canbe used to approximate pk is the beta probability density function We use here theparametrization proposed by Walley (1996)

fB(x ω τ ) = xωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1

B(ω middot τ ω minus ω middot τ ) ω gt 0 0 τ 1 (10)

where B(α β) = int 10 xαminus1(1 minus x)βminus1 dx is the beta function (Weisstein 2002) While

being defined by only two parameters it can be either symmetric (when τ = 05)

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1353

a2(3)

t2(y) t

2(x)

Xtyt

a2(3)

t2(y) t

2(x)

Xtyt

(a) (b)

Figure 1 Directed acyclic graph representing the synthetic simulation set-up used by WB05 for (a) calibrationand (b) prediction

squares represent known parameters and observations Arrows represent dependencelinks Figure 1(a) presents the DAG for the calibration period during which the obser-vations yt are known (and thus represented by a square) whereas Fig 1(b) presents theDAG for the validation period during which the observations yt are unknown (and thusrepresented by a circle)

The EPS thus displays a lsquospread-skillrsquo relationship when the observation yt is lessvariable thus more predictable ie σ 2

t (y) happens to be small the observed varianceof the ensemble members will tend to be smaller and conversely The EPS will beunderdispersive if μa lt 1 and overdispersive if μa gt 1 Because the variable of interestis univariate and the forecasts are unbiased the best member xlowast

t of each ensemble issimply identified by minimizing the absolute difference between the observations andthe forecasts

(a) When and why the best-member method failsTo test the best-member method WB05 have used values of K from 1 to 20 and

values of μa from 01 to 09 As we want to illustrate a method which gives a differentweight to each ensemble member we will focus on EPS having between K = 3 andK = 20 members We will however consider values of μa from 01 to 19 as we will beable to deal with overdispersive EPS As proposed by WB05 each time we will obtain adatabase of past forecasts by generating T = 15 000 observations yt and correspondingensemble forecasts Xt The different methods will then be compared using statisticscomputed on statistical ensembles Yt = ytkn obtained by dressing a second set ofT = 15 000 ensemble forecasts Xt To obtain accurate statistics N = 150 statisticalensemble members will be drawn from each ensemble member xtk

The observations yt are drawn from an infinite mixture of normal distributionswith zero mean but different variances the variances being drawn from a chi-squaredwith ν = 3 degrees of freedom and thus having an expectation of E[σ 2

t (y)] = ν = 3It follows that the stationary distribution of the process yt has zero mean a varianceσ 2(y) of 3 (equal to the expectation of the chi-squared distribution) and a skewnessof zero Kurtosis as estimated by recursive adaptive Simpson quadrature using theMATLAB function quad (Gander and Gautschi 2000) is very close to β2 = 5 Recallthat kurtosis measures the degree of peakedness of a distribution that it is defined bythe ratio of the fourth centred moment to the square of the variance β2 = μ4σ

4 and

1354 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

05

05 0

60

6

06

07

07

07

08

08

08

09

09

09

11

11

11

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

17

17

17

18

18

18

19

19

19

2

22

12

2

23

1

1

1a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

12

2

2

20

00

0

0

0

00

a

K

(b) 2 ens - 2 obs

Figure 2 (a) Ratio of the variance of the statistical ensemble members generated by the original best-membermethod to the variance of the observations and (b) difference between the kurtosis of these ensemble membersand the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and of the size

of the dynamical ensemble (K)

that the kurtosis of the normal distribution is β2 = 3 (Weisstein 2002) Kurtosis excesswith respect to the normal distribution is defined by γ2 = β2 minus 3 A distribution witha high peak and heavier tails than the normal distribution is said to be leptokurticand has a positive kurtosis excess ie γ2 gt 0 A flat-topped curve with thin tails issaid to be platykurtic and has a negative kurtosis excess ie γ2 lt 0 The stationarydistribution of the process yt with a kurtosis excess of γ2 = 2 is leptokurtic and thushas heavier tails than a normal distribution The stationary distribution of the dynamicalensemble members Xt = xtk and best-member error εlowast

t also has zero mean and zeroskewness and therefore the stationary distribution of the statistical ensemble membersYt = ytkn will also have zero mean and zero skewness However their variance andkurtosis vary with K the number of members in the EPS and μa the expected amountof underdispersion or overdispersion of the undressed members

Figure 2(a) shows the ratio of the variance of the statistical ensemble membersgenerated by the best-member method to the variance of the observations as a functionof μa and K while Fig 2(b) shows the difference between the kurtosis of the ensemblemembers and the kurtosis of the observations as a function of μa and K The thick lineon both graphs shows the combinations of μa and K for which the statistical ensemblemembers have the correct variance and kurtosis

It can be seen in Fig 2(a) that the variance of the statistical ensemble increaseswith the variance of the dynamical ensemble and decreases with the ensemble sizeThese two characteristics are not specific to this experiment but rather much moregeneral Indeed the variance of the statistical ensemble can only be larger than that ofthe dynamical ensemble and as the number of ensemble members increase the distancebetween the best ensemble member and the observation will generally decrease towards

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1355

0 05 1 15 2-02

0

02

04

06

08

1

12

14

16

a

2 en

s -

2 ob

s

Figure 3 Difference between the kurtosis of the dynamical ensemble members and the kurtosis of the observa-tions as a function of the expected amount of underdispersion (μa )

zero so that the variance of the statistical ensemble will tend towards the variance of thedynamical ensemble for large ensemble sizes Hence the best-member method cannotprovide reliable forecasts in a general setting

It is more difficult to explain the behaviour of the kurtosis as a function of μa andK expressed by Fig 2(b) In this particular case the kurtosis of the statistical ensembledecreases with μa for μa lt 05 whereas for μa gt 05 it increases with K This peculiarbehaviour is caused by the fact that the dynamical ensemble members are not drawnfrom the same distribution as the observations and in particular do not have the samekurtosis Figure 3 shows the kurtosis of the EPS outputs as a function of μa It can beseen that for μa lt 05 the kurtosis of the dynamical ensemble members is already muchlarger than the kurtosis of the observations Hence the best-member method starts withensemble members which already have too heavy tails Unfortunately not only does thebest-member method fail to provide ensemble members with the correct variance butit generally increases the kurtosis which will lead to a systematic overestimation of theprobability of extreme events

(b) Rescaling the variance leads to the overestimation of the probability ofextreme events

WB05 having shown that the best-member method does not produce ensembleswith the correct variance propose to dress each dynamical ensemble member with anindependent error distribution having still a zero mean but a covariance chosen such thatthe covariance between any pair of ensemble members be the same as the covariancebetween one ensemble member and the observations For scalar observations theypropose the following estimator for the variance of the error distribution

s2 = mse(y x) minus (1 + 1K) middot s2x (4)

where mse(y x) = T minus1 sumTt=1(yt minus xt )

2 is the mean square error between the observa-tions yt and the ensemble mean xt = Kminus1 sumK

k=1 xkt computed from a database ofT error forecasts and observations If the dynamical ensemble members were alreadyoverdispersive then s2 will be negative so that the method of WB05 is not applicable

1356 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

11

1

2

2

2

3

3

3 3

4

4

4

4

5

5

5

6

0

00

0

a

K

(b) 2 ens - 2 obs

Figure 4 (a) Ratio of the variance of the statistical ensemble members generated by the modified best-membermethod of WB05 to the variance of the observations and (b) difference between the kurtosis of these ensemblemembers and the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and

of the size of the dynamical ensemble (K)

To dress individual ensemble members with an error distribution having a varianceequal to s2 one can simply multiply best-member errors εlowast

t drawn from an archive

of past forecasts by a factor ω =radic

s2s2εlowast where s2

εlowast = (T minus 1)minus1 sumTt=1(ε

lowastt )

2 is theestimated variance of the best-member error The statistical ensemble members willthen be given by

ytkn = xtk + ω middot εtkn (5)

where the innovations εtkn are drawn at random from a database of past error fore-casts The parameter ω will be larger than one if the best-member method producedunderdispersive ensemble members and smaller than one otherwise

Figure 4(a) shows that the variance produced by the modified best-member methodof WB05 is very close to the variance of the observations departures being likelycaused by sampling uncertainty and roundoff errors Figure 4(b) shows however thatthis improvement in the variance of the dressed ensemble members comes at the costof an increased kurtosis Hence the method proposed by WB05 leads to predictivedistributions which have much heavier tails than the observations which will lead toan overestimation of the probability of extreme events

4 DRESSING AND WEIGHTING EACH MEMBER DIFFERENTLY THE WEIGHTED MEMBERSMETHOD

In the best-member method each ensemble member is dressed using the sameerror distribution This seems to make sense if all ensemble members are exchangeableprior to their observation as there is no a priori reason for assuming that any ensemblemember has any more chance of being the best member of the ensemble or that its

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1357

error distribution should be any different from that of the other members if it indeedhappens to be the best member Recall that a set of random variables u = u1 uKare exchangeable if their joint probability density function p(u) does not depend on theirorder ie if for any permutation uprime of the elements of u we have p(u) = p(uprime) (Lindleyand Novick 1981) In other words the forecasts can be reordered and renumberedwithout losing any information on the predictand However we must not forget that thevalues taken by all ensemble members are known at the time an ensemble member getsdressed with an error distribution Hence member dressing is performed a posterioriwhen the ensemble members are not exchangeable any more

Consider for example an EPS which produces underdispersive dynamical ensem-bles Because they are underdispersive the outcome will often lie outside the spread ofthe ensemble Hence the extrema of the ensemble will have much more chance of beingthe best member of the ensemble than a member which is close to the ensemble meanConversely if the EPS is highly overdispersive members close to the ensemble meanwill have more chance of being the best members of the ensemble than the extrema ofthe ensemble Furthermore if an ensemble member close to a mode of the ensembleempirical distribution happens to be the best member of the ensemble the error madewill tend to be small as it must be smaller than half the distance to the closest ensemblemember otherwise it would not be the best member of the ensemble

Hence the probability that an ensemble member be the best member of theensemble as well as the error distribution of the best member of the ensemble can bothdepend on the location of the ensemble member within the ensemble For multivariateforecasts this location can be measured for example by the distance to the ensemblemean using the norm selected to identify the best member of the ensemble Forunivariate forecasts which are the focus of this paper a simpler solution exists wecan simply sort the ensemble members and take into account the rank of a member inthe sorted ensemble when dressing it with an error distribution

(a) Dressing each ensemble member with a different kernelGoing back to the simulations performed with the synthetic EPS system we can

sort the dynamical ensemble forecasts and estimate for ranked ensemble members theprobability of being the best member of the ensemble as well as the mean and varianceof the error distribution Define xt(k) to be the kth order statistic of an ensemble Xt =xtk k = 1 2 K and εlowast

(k)= yt minus xlowast

t | xlowastt = xt(k) t = 1 2 T to be the

best-member errors observed in the database of past forecasts when the best memberwas the kth order statistic Define also pk to be the probability that the best member isxt(k) ie pk = Pr[xlowast

t = xt(k)]Figure 5 shows pk as a function of k for K = 20 ensemble members for μa = 03

(an underdispersive ensemble) and for μa = 17 (an overdispersive ensemble) It canbe seen that pk depends on k as well as on μa as predicted the lowest and highestensemble members have higher (lower) probability of being the best member of anunderdispersive (overdispersive) ensemble Figure 6 presents the mean and standarddeviation of εlowast

(k) again for μa = 03 and μa = 17 It can be seen that the error

distribution is biased when the best member is one of the lowest or one of the highestmembers of the ensemble and that the standard deviation of the error is larger in thesecases Hence it does not make a lot of sense to dress each ensemble member with thesame error distribution

When dressing the kth member in the ordered ensemble we therefore propose thatinstead of resampling from the archive of all best-member errors we instead resample

1358 V FORTIN et al

0 5 10 15 200

005

01

015

02P

roba

bilit

y

k0 5 10 15 20

0

002

004

006

008

01

k

Pro

bab

ility

(a ) a=03 (b) a=17

Figure 5 Probability that the member of rank k is the best member of the ordered ensemble for (a) μa = 03and (b) μa = 17

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

Mean Standard deviation

(a) a=03 (b) a=17

Figure 6 Mean and standard deviation of the best-member error distribution as a function of the rank k of themember in the ordered ensemble for (a) μa = 03 and (b) μa = 17

from εlowast(k)

to obtain dressed ensemble members

ytkn = xt(k) + ω middot εt(k)n (6)

where εt(k)n is drawn at random from εlowast(k)

(b) Giving a different weight to each dynamical ensemble memberThe number of dressed members generated from the kth order statistic should also

reflect the probability that this particular member is the best member of the ensemble Ifwe want to obtain M ensemble members from the original K-member ensemble thena possibility would be to draw Nk = pk middot M dressed ensemble members from xt(k)However as we are still not drawing the statistical members Yt from the conditionaldistribution p(yt | Xt ) we would not be guaranteed to obtain statistical members whichdisplay the desired amount of variance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1359

An alternative is to optimize the number of dressed ensemble members drawn fromeach dynamical member so as to get the correct variance Let wk be the proportion ofthe statistical members that are generated from the kth order statistic of each dynamicalensemble If we build an archive by mixing statistical ensemble members from differentforecasts each statistical ensemble member y will have been obtained as the sum ofthe kth order statistic of a dynamical ensemble xt(k) and an independent error termεt(k)n resampled from εlowast

(k) Let μ(x(k)) and σ 2(x(k)) be respectively the mean and

variance of xt(k) and μ(ε(k)) and σ 2(ε(k)) be the mean and variance of εt(k)n Denoteby y(k) a statistical ensemble member which has been drawn as the sum of a kth orderstatistic and an error term It follows that the mean and variance of y(k) are respectivelygiven by μ(y(k)) = μ(x(k)) + μ(ε(k)) and σ 2(y(k)) = σ 2(x(k)) + σ 2(ε(k)) If the numberof forecasts in the archive is large any given ensemble member y can then be consideredto have been drawn from a mixture of K distributions p(y(k)) where the probability ofeach distribution corresponds to the proportion of members in the archive which havebeen drawn from the kth order statistic of a dynamical ensemble wk

p(y ) =K

sum

k=1

wk middot p(y(k)) (7)

It can be shown that the variance σ 2(y ) of this finite mixture is given by

σ 2(y ) =K

sum

k=1

wk(σ2(x(k)) + σ 2(ε(k)) + (μ(x(k)) + μ(ε(k)))

2)

minus( K

sum

k=1

wk(μ(x(k)) + μ(ε(k)))

)2

(8)

Given estimates of μ(x(k)) μ(ε(k)) σ 2(x(k)) and σ 2(ε(k)) respectively denoted byx(k) ε(k) s2

x(k)and s2

ε(k) we can hence obtain an estimate s2

y of σ 2(y ) for a given setof weights wk In order to have statistical ensemble members which have on averagethe correct variance we would like to find weights wk for which s2

y = s2y the observed

variance of the observations on the same period

s2y =

Ksum

k=1

wk(s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2) minus

( Ksum

k=1

wk(x(k) + ε(k))

)2

(9)

There can obviously be multiple solutions to this equation or no solution for thatmatter We propose to constrain the problem further by taking into account the observedshape of pk the probability that the best member is the kth order statistic as a functionof k For underdispersive dynamical ensembles the function pk is U-shaped whereasit is bell-shaped for overdispersive ensembles A simple parametric function which canbe used to approximate pk is the beta probability density function We use here theparametrization proposed by Walley (1996)

fB(x ω τ ) = xωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1

B(ω middot τ ω minus ω middot τ ) ω gt 0 0 τ 1 (10)

where B(α β) = int 10 xαminus1(1 minus x)βminus1 dx is the beta function (Weisstein 2002) While

being defined by only two parameters it can be either symmetric (when τ = 05)

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

1354 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

05

05 0

60

6

06

07

07

07

08

08

08

09

09

09

11

11

11

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

17

17

17

18

18

18

19

19

19

2

22

12

2

23

1

1

1a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

12

2

2

20

00

0

0

0

00

a

K

(b) 2 ens - 2 obs

Figure 2 (a) Ratio of the variance of the statistical ensemble members generated by the original best-membermethod to the variance of the observations and (b) difference between the kurtosis of these ensemble membersand the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and of the size

of the dynamical ensemble (K)

that the kurtosis of the normal distribution is β2 = 3 (Weisstein 2002) Kurtosis excesswith respect to the normal distribution is defined by γ2 = β2 minus 3 A distribution witha high peak and heavier tails than the normal distribution is said to be leptokurticand has a positive kurtosis excess ie γ2 gt 0 A flat-topped curve with thin tails issaid to be platykurtic and has a negative kurtosis excess ie γ2 lt 0 The stationarydistribution of the process yt with a kurtosis excess of γ2 = 2 is leptokurtic and thushas heavier tails than a normal distribution The stationary distribution of the dynamicalensemble members Xt = xtk and best-member error εlowast

t also has zero mean and zeroskewness and therefore the stationary distribution of the statistical ensemble membersYt = ytkn will also have zero mean and zero skewness However their variance andkurtosis vary with K the number of members in the EPS and μa the expected amountof underdispersion or overdispersion of the undressed members

Figure 2(a) shows the ratio of the variance of the statistical ensemble membersgenerated by the best-member method to the variance of the observations as a functionof μa and K while Fig 2(b) shows the difference between the kurtosis of the ensemblemembers and the kurtosis of the observations as a function of μa and K The thick lineon both graphs shows the combinations of μa and K for which the statistical ensemblemembers have the correct variance and kurtosis

It can be seen in Fig 2(a) that the variance of the statistical ensemble increaseswith the variance of the dynamical ensemble and decreases with the ensemble sizeThese two characteristics are not specific to this experiment but rather much moregeneral Indeed the variance of the statistical ensemble can only be larger than that ofthe dynamical ensemble and as the number of ensemble members increase the distancebetween the best ensemble member and the observation will generally decrease towards

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1355

0 05 1 15 2-02

0

02

04

06

08

1

12

14

16

a

2 en

s -

2 ob

s

Figure 3 Difference between the kurtosis of the dynamical ensemble members and the kurtosis of the observa-tions as a function of the expected amount of underdispersion (μa )

zero so that the variance of the statistical ensemble will tend towards the variance of thedynamical ensemble for large ensemble sizes Hence the best-member method cannotprovide reliable forecasts in a general setting

It is more difficult to explain the behaviour of the kurtosis as a function of μa andK expressed by Fig 2(b) In this particular case the kurtosis of the statistical ensembledecreases with μa for μa lt 05 whereas for μa gt 05 it increases with K This peculiarbehaviour is caused by the fact that the dynamical ensemble members are not drawnfrom the same distribution as the observations and in particular do not have the samekurtosis Figure 3 shows the kurtosis of the EPS outputs as a function of μa It can beseen that for μa lt 05 the kurtosis of the dynamical ensemble members is already muchlarger than the kurtosis of the observations Hence the best-member method starts withensemble members which already have too heavy tails Unfortunately not only does thebest-member method fail to provide ensemble members with the correct variance butit generally increases the kurtosis which will lead to a systematic overestimation of theprobability of extreme events

(b) Rescaling the variance leads to the overestimation of the probability ofextreme events

WB05 having shown that the best-member method does not produce ensembleswith the correct variance propose to dress each dynamical ensemble member with anindependent error distribution having still a zero mean but a covariance chosen such thatthe covariance between any pair of ensemble members be the same as the covariancebetween one ensemble member and the observations For scalar observations theypropose the following estimator for the variance of the error distribution

s2 = mse(y x) minus (1 + 1K) middot s2x (4)

where mse(y x) = T minus1 sumTt=1(yt minus xt )

2 is the mean square error between the observa-tions yt and the ensemble mean xt = Kminus1 sumK

k=1 xkt computed from a database ofT error forecasts and observations If the dynamical ensemble members were alreadyoverdispersive then s2 will be negative so that the method of WB05 is not applicable

1356 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

11

1

2

2

2

3

3

3 3

4

4

4

4

5

5

5

6

0

00

0

a

K

(b) 2 ens - 2 obs

Figure 4 (a) Ratio of the variance of the statistical ensemble members generated by the modified best-membermethod of WB05 to the variance of the observations and (b) difference between the kurtosis of these ensemblemembers and the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and

of the size of the dynamical ensemble (K)

To dress individual ensemble members with an error distribution having a varianceequal to s2 one can simply multiply best-member errors εlowast

t drawn from an archive

of past forecasts by a factor ω =radic

s2s2εlowast where s2

εlowast = (T minus 1)minus1 sumTt=1(ε

lowastt )

2 is theestimated variance of the best-member error The statistical ensemble members willthen be given by

ytkn = xtk + ω middot εtkn (5)

where the innovations εtkn are drawn at random from a database of past error fore-casts The parameter ω will be larger than one if the best-member method producedunderdispersive ensemble members and smaller than one otherwise

Figure 4(a) shows that the variance produced by the modified best-member methodof WB05 is very close to the variance of the observations departures being likelycaused by sampling uncertainty and roundoff errors Figure 4(b) shows however thatthis improvement in the variance of the dressed ensemble members comes at the costof an increased kurtosis Hence the method proposed by WB05 leads to predictivedistributions which have much heavier tails than the observations which will lead toan overestimation of the probability of extreme events

4 DRESSING AND WEIGHTING EACH MEMBER DIFFERENTLY THE WEIGHTED MEMBERSMETHOD

In the best-member method each ensemble member is dressed using the sameerror distribution This seems to make sense if all ensemble members are exchangeableprior to their observation as there is no a priori reason for assuming that any ensemblemember has any more chance of being the best member of the ensemble or that its

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1357

error distribution should be any different from that of the other members if it indeedhappens to be the best member Recall that a set of random variables u = u1 uKare exchangeable if their joint probability density function p(u) does not depend on theirorder ie if for any permutation uprime of the elements of u we have p(u) = p(uprime) (Lindleyand Novick 1981) In other words the forecasts can be reordered and renumberedwithout losing any information on the predictand However we must not forget that thevalues taken by all ensemble members are known at the time an ensemble member getsdressed with an error distribution Hence member dressing is performed a posterioriwhen the ensemble members are not exchangeable any more

Consider for example an EPS which produces underdispersive dynamical ensem-bles Because they are underdispersive the outcome will often lie outside the spread ofthe ensemble Hence the extrema of the ensemble will have much more chance of beingthe best member of the ensemble than a member which is close to the ensemble meanConversely if the EPS is highly overdispersive members close to the ensemble meanwill have more chance of being the best members of the ensemble than the extrema ofthe ensemble Furthermore if an ensemble member close to a mode of the ensembleempirical distribution happens to be the best member of the ensemble the error madewill tend to be small as it must be smaller than half the distance to the closest ensemblemember otherwise it would not be the best member of the ensemble

Hence the probability that an ensemble member be the best member of theensemble as well as the error distribution of the best member of the ensemble can bothdepend on the location of the ensemble member within the ensemble For multivariateforecasts this location can be measured for example by the distance to the ensemblemean using the norm selected to identify the best member of the ensemble Forunivariate forecasts which are the focus of this paper a simpler solution exists wecan simply sort the ensemble members and take into account the rank of a member inthe sorted ensemble when dressing it with an error distribution

(a) Dressing each ensemble member with a different kernelGoing back to the simulations performed with the synthetic EPS system we can

sort the dynamical ensemble forecasts and estimate for ranked ensemble members theprobability of being the best member of the ensemble as well as the mean and varianceof the error distribution Define xt(k) to be the kth order statistic of an ensemble Xt =xtk k = 1 2 K and εlowast

(k)= yt minus xlowast

t | xlowastt = xt(k) t = 1 2 T to be the

best-member errors observed in the database of past forecasts when the best memberwas the kth order statistic Define also pk to be the probability that the best member isxt(k) ie pk = Pr[xlowast

t = xt(k)]Figure 5 shows pk as a function of k for K = 20 ensemble members for μa = 03

(an underdispersive ensemble) and for μa = 17 (an overdispersive ensemble) It canbe seen that pk depends on k as well as on μa as predicted the lowest and highestensemble members have higher (lower) probability of being the best member of anunderdispersive (overdispersive) ensemble Figure 6 presents the mean and standarddeviation of εlowast

(k) again for μa = 03 and μa = 17 It can be seen that the error

distribution is biased when the best member is one of the lowest or one of the highestmembers of the ensemble and that the standard deviation of the error is larger in thesecases Hence it does not make a lot of sense to dress each ensemble member with thesame error distribution

When dressing the kth member in the ordered ensemble we therefore propose thatinstead of resampling from the archive of all best-member errors we instead resample

1358 V FORTIN et al

0 5 10 15 200

005

01

015

02P

roba

bilit

y

k0 5 10 15 20

0

002

004

006

008

01

k

Pro

bab

ility

(a ) a=03 (b) a=17

Figure 5 Probability that the member of rank k is the best member of the ordered ensemble for (a) μa = 03and (b) μa = 17

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

Mean Standard deviation

(a) a=03 (b) a=17

Figure 6 Mean and standard deviation of the best-member error distribution as a function of the rank k of themember in the ordered ensemble for (a) μa = 03 and (b) μa = 17

from εlowast(k)

to obtain dressed ensemble members

ytkn = xt(k) + ω middot εt(k)n (6)

where εt(k)n is drawn at random from εlowast(k)

(b) Giving a different weight to each dynamical ensemble memberThe number of dressed members generated from the kth order statistic should also

reflect the probability that this particular member is the best member of the ensemble Ifwe want to obtain M ensemble members from the original K-member ensemble thena possibility would be to draw Nk = pk middot M dressed ensemble members from xt(k)However as we are still not drawing the statistical members Yt from the conditionaldistribution p(yt | Xt ) we would not be guaranteed to obtain statistical members whichdisplay the desired amount of variance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1359

An alternative is to optimize the number of dressed ensemble members drawn fromeach dynamical member so as to get the correct variance Let wk be the proportion ofthe statistical members that are generated from the kth order statistic of each dynamicalensemble If we build an archive by mixing statistical ensemble members from differentforecasts each statistical ensemble member y will have been obtained as the sum ofthe kth order statistic of a dynamical ensemble xt(k) and an independent error termεt(k)n resampled from εlowast

(k) Let μ(x(k)) and σ 2(x(k)) be respectively the mean and

variance of xt(k) and μ(ε(k)) and σ 2(ε(k)) be the mean and variance of εt(k)n Denoteby y(k) a statistical ensemble member which has been drawn as the sum of a kth orderstatistic and an error term It follows that the mean and variance of y(k) are respectivelygiven by μ(y(k)) = μ(x(k)) + μ(ε(k)) and σ 2(y(k)) = σ 2(x(k)) + σ 2(ε(k)) If the numberof forecasts in the archive is large any given ensemble member y can then be consideredto have been drawn from a mixture of K distributions p(y(k)) where the probability ofeach distribution corresponds to the proportion of members in the archive which havebeen drawn from the kth order statistic of a dynamical ensemble wk

p(y ) =K

sum

k=1

wk middot p(y(k)) (7)

It can be shown that the variance σ 2(y ) of this finite mixture is given by

σ 2(y ) =K

sum

k=1

wk(σ2(x(k)) + σ 2(ε(k)) + (μ(x(k)) + μ(ε(k)))

2)

minus( K

sum

k=1

wk(μ(x(k)) + μ(ε(k)))

)2

(8)

Given estimates of μ(x(k)) μ(ε(k)) σ 2(x(k)) and σ 2(ε(k)) respectively denoted byx(k) ε(k) s2

x(k)and s2

ε(k) we can hence obtain an estimate s2

y of σ 2(y ) for a given setof weights wk In order to have statistical ensemble members which have on averagethe correct variance we would like to find weights wk for which s2

y = s2y the observed

variance of the observations on the same period

s2y =

Ksum

k=1

wk(s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2) minus

( Ksum

k=1

wk(x(k) + ε(k))

)2

(9)

There can obviously be multiple solutions to this equation or no solution for thatmatter We propose to constrain the problem further by taking into account the observedshape of pk the probability that the best member is the kth order statistic as a functionof k For underdispersive dynamical ensembles the function pk is U-shaped whereasit is bell-shaped for overdispersive ensembles A simple parametric function which canbe used to approximate pk is the beta probability density function We use here theparametrization proposed by Walley (1996)

fB(x ω τ ) = xωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1

B(ω middot τ ω minus ω middot τ ) ω gt 0 0 τ 1 (10)

where B(α β) = int 10 xαminus1(1 minus x)βminus1 dx is the beta function (Weisstein 2002) While

being defined by only two parameters it can be either symmetric (when τ = 05)

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1355

0 05 1 15 2-02

0

02

04

06

08

1

12

14

16

a

2 en

s -

2 ob

s

Figure 3 Difference between the kurtosis of the dynamical ensemble members and the kurtosis of the observa-tions as a function of the expected amount of underdispersion (μa )

zero so that the variance of the statistical ensemble will tend towards the variance of thedynamical ensemble for large ensemble sizes Hence the best-member method cannotprovide reliable forecasts in a general setting

It is more difficult to explain the behaviour of the kurtosis as a function of μa andK expressed by Fig 2(b) In this particular case the kurtosis of the statistical ensembledecreases with μa for μa lt 05 whereas for μa gt 05 it increases with K This peculiarbehaviour is caused by the fact that the dynamical ensemble members are not drawnfrom the same distribution as the observations and in particular do not have the samekurtosis Figure 3 shows the kurtosis of the EPS outputs as a function of μa It can beseen that for μa lt 05 the kurtosis of the dynamical ensemble members is already muchlarger than the kurtosis of the observations Hence the best-member method starts withensemble members which already have too heavy tails Unfortunately not only does thebest-member method fail to provide ensemble members with the correct variance butit generally increases the kurtosis which will lead to a systematic overestimation of theprobability of extreme events

(b) Rescaling the variance leads to the overestimation of the probability ofextreme events

WB05 having shown that the best-member method does not produce ensembleswith the correct variance propose to dress each dynamical ensemble member with anindependent error distribution having still a zero mean but a covariance chosen such thatthe covariance between any pair of ensemble members be the same as the covariancebetween one ensemble member and the observations For scalar observations theypropose the following estimator for the variance of the error distribution

s2 = mse(y x) minus (1 + 1K) middot s2x (4)

where mse(y x) = T minus1 sumTt=1(yt minus xt )

2 is the mean square error between the observa-tions yt and the ensemble mean xt = Kminus1 sumK

k=1 xkt computed from a database ofT error forecasts and observations If the dynamical ensemble members were alreadyoverdispersive then s2 will be negative so that the method of WB05 is not applicable

1356 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

11

1

2

2

2

3

3

3 3

4

4

4

4

5

5

5

6

0

00

0

a

K

(b) 2 ens - 2 obs

Figure 4 (a) Ratio of the variance of the statistical ensemble members generated by the modified best-membermethod of WB05 to the variance of the observations and (b) difference between the kurtosis of these ensemblemembers and the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and

of the size of the dynamical ensemble (K)

To dress individual ensemble members with an error distribution having a varianceequal to s2 one can simply multiply best-member errors εlowast

t drawn from an archive

of past forecasts by a factor ω =radic

s2s2εlowast where s2

εlowast = (T minus 1)minus1 sumTt=1(ε

lowastt )

2 is theestimated variance of the best-member error The statistical ensemble members willthen be given by

ytkn = xtk + ω middot εtkn (5)

where the innovations εtkn are drawn at random from a database of past error fore-casts The parameter ω will be larger than one if the best-member method producedunderdispersive ensemble members and smaller than one otherwise

Figure 4(a) shows that the variance produced by the modified best-member methodof WB05 is very close to the variance of the observations departures being likelycaused by sampling uncertainty and roundoff errors Figure 4(b) shows however thatthis improvement in the variance of the dressed ensemble members comes at the costof an increased kurtosis Hence the method proposed by WB05 leads to predictivedistributions which have much heavier tails than the observations which will lead toan overestimation of the probability of extreme events

4 DRESSING AND WEIGHTING EACH MEMBER DIFFERENTLY THE WEIGHTED MEMBERSMETHOD

In the best-member method each ensemble member is dressed using the sameerror distribution This seems to make sense if all ensemble members are exchangeableprior to their observation as there is no a priori reason for assuming that any ensemblemember has any more chance of being the best member of the ensemble or that its

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1357

error distribution should be any different from that of the other members if it indeedhappens to be the best member Recall that a set of random variables u = u1 uKare exchangeable if their joint probability density function p(u) does not depend on theirorder ie if for any permutation uprime of the elements of u we have p(u) = p(uprime) (Lindleyand Novick 1981) In other words the forecasts can be reordered and renumberedwithout losing any information on the predictand However we must not forget that thevalues taken by all ensemble members are known at the time an ensemble member getsdressed with an error distribution Hence member dressing is performed a posterioriwhen the ensemble members are not exchangeable any more

Consider for example an EPS which produces underdispersive dynamical ensem-bles Because they are underdispersive the outcome will often lie outside the spread ofthe ensemble Hence the extrema of the ensemble will have much more chance of beingthe best member of the ensemble than a member which is close to the ensemble meanConversely if the EPS is highly overdispersive members close to the ensemble meanwill have more chance of being the best members of the ensemble than the extrema ofthe ensemble Furthermore if an ensemble member close to a mode of the ensembleempirical distribution happens to be the best member of the ensemble the error madewill tend to be small as it must be smaller than half the distance to the closest ensemblemember otherwise it would not be the best member of the ensemble

Hence the probability that an ensemble member be the best member of theensemble as well as the error distribution of the best member of the ensemble can bothdepend on the location of the ensemble member within the ensemble For multivariateforecasts this location can be measured for example by the distance to the ensemblemean using the norm selected to identify the best member of the ensemble Forunivariate forecasts which are the focus of this paper a simpler solution exists wecan simply sort the ensemble members and take into account the rank of a member inthe sorted ensemble when dressing it with an error distribution

(a) Dressing each ensemble member with a different kernelGoing back to the simulations performed with the synthetic EPS system we can

sort the dynamical ensemble forecasts and estimate for ranked ensemble members theprobability of being the best member of the ensemble as well as the mean and varianceof the error distribution Define xt(k) to be the kth order statistic of an ensemble Xt =xtk k = 1 2 K and εlowast

(k)= yt minus xlowast

t | xlowastt = xt(k) t = 1 2 T to be the

best-member errors observed in the database of past forecasts when the best memberwas the kth order statistic Define also pk to be the probability that the best member isxt(k) ie pk = Pr[xlowast

t = xt(k)]Figure 5 shows pk as a function of k for K = 20 ensemble members for μa = 03

(an underdispersive ensemble) and for μa = 17 (an overdispersive ensemble) It canbe seen that pk depends on k as well as on μa as predicted the lowest and highestensemble members have higher (lower) probability of being the best member of anunderdispersive (overdispersive) ensemble Figure 6 presents the mean and standarddeviation of εlowast

(k) again for μa = 03 and μa = 17 It can be seen that the error

distribution is biased when the best member is one of the lowest or one of the highestmembers of the ensemble and that the standard deviation of the error is larger in thesecases Hence it does not make a lot of sense to dress each ensemble member with thesame error distribution

When dressing the kth member in the ordered ensemble we therefore propose thatinstead of resampling from the archive of all best-member errors we instead resample

1358 V FORTIN et al

0 5 10 15 200

005

01

015

02P

roba

bilit

y

k0 5 10 15 20

0

002

004

006

008

01

k

Pro

bab

ility

(a ) a=03 (b) a=17

Figure 5 Probability that the member of rank k is the best member of the ordered ensemble for (a) μa = 03and (b) μa = 17

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

Mean Standard deviation

(a) a=03 (b) a=17

Figure 6 Mean and standard deviation of the best-member error distribution as a function of the rank k of themember in the ordered ensemble for (a) μa = 03 and (b) μa = 17

from εlowast(k)

to obtain dressed ensemble members

ytkn = xt(k) + ω middot εt(k)n (6)

where εt(k)n is drawn at random from εlowast(k)

(b) Giving a different weight to each dynamical ensemble memberThe number of dressed members generated from the kth order statistic should also

reflect the probability that this particular member is the best member of the ensemble Ifwe want to obtain M ensemble members from the original K-member ensemble thena possibility would be to draw Nk = pk middot M dressed ensemble members from xt(k)However as we are still not drawing the statistical members Yt from the conditionaldistribution p(yt | Xt ) we would not be guaranteed to obtain statistical members whichdisplay the desired amount of variance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1359

An alternative is to optimize the number of dressed ensemble members drawn fromeach dynamical member so as to get the correct variance Let wk be the proportion ofthe statistical members that are generated from the kth order statistic of each dynamicalensemble If we build an archive by mixing statistical ensemble members from differentforecasts each statistical ensemble member y will have been obtained as the sum ofthe kth order statistic of a dynamical ensemble xt(k) and an independent error termεt(k)n resampled from εlowast

(k) Let μ(x(k)) and σ 2(x(k)) be respectively the mean and

variance of xt(k) and μ(ε(k)) and σ 2(ε(k)) be the mean and variance of εt(k)n Denoteby y(k) a statistical ensemble member which has been drawn as the sum of a kth orderstatistic and an error term It follows that the mean and variance of y(k) are respectivelygiven by μ(y(k)) = μ(x(k)) + μ(ε(k)) and σ 2(y(k)) = σ 2(x(k)) + σ 2(ε(k)) If the numberof forecasts in the archive is large any given ensemble member y can then be consideredto have been drawn from a mixture of K distributions p(y(k)) where the probability ofeach distribution corresponds to the proportion of members in the archive which havebeen drawn from the kth order statistic of a dynamical ensemble wk

p(y ) =K

sum

k=1

wk middot p(y(k)) (7)

It can be shown that the variance σ 2(y ) of this finite mixture is given by

σ 2(y ) =K

sum

k=1

wk(σ2(x(k)) + σ 2(ε(k)) + (μ(x(k)) + μ(ε(k)))

2)

minus( K

sum

k=1

wk(μ(x(k)) + μ(ε(k)))

)2

(8)

Given estimates of μ(x(k)) μ(ε(k)) σ 2(x(k)) and σ 2(ε(k)) respectively denoted byx(k) ε(k) s2

x(k)and s2

ε(k) we can hence obtain an estimate s2

y of σ 2(y ) for a given setof weights wk In order to have statistical ensemble members which have on averagethe correct variance we would like to find weights wk for which s2

y = s2y the observed

variance of the observations on the same period

s2y =

Ksum

k=1

wk(s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2) minus

( Ksum

k=1

wk(x(k) + ε(k))

)2

(9)

There can obviously be multiple solutions to this equation or no solution for thatmatter We propose to constrain the problem further by taking into account the observedshape of pk the probability that the best member is the kth order statistic as a functionof k For underdispersive dynamical ensembles the function pk is U-shaped whereasit is bell-shaped for overdispersive ensembles A simple parametric function which canbe used to approximate pk is the beta probability density function We use here theparametrization proposed by Walley (1996)

fB(x ω τ ) = xωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1

B(ω middot τ ω minus ω middot τ ) ω gt 0 0 τ 1 (10)

where B(α β) = int 10 xαminus1(1 minus x)βminus1 dx is the beta function (Weisstein 2002) While

being defined by only two parameters it can be either symmetric (when τ = 05)

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

1356 V FORTIN et al

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

a

K

(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

11

1

2

2

2

3

3

3 3

4

4

4

4

5

5

5

6

0

00

0

a

K

(b) 2 ens - 2 obs

Figure 4 (a) Ratio of the variance of the statistical ensemble members generated by the modified best-membermethod of WB05 to the variance of the observations and (b) difference between the kurtosis of these ensemblemembers and the kurtosis of the observations as a function of the expected amount of underdispersion (μa ) and

of the size of the dynamical ensemble (K)

To dress individual ensemble members with an error distribution having a varianceequal to s2 one can simply multiply best-member errors εlowast

t drawn from an archive

of past forecasts by a factor ω =radic

s2s2εlowast where s2

εlowast = (T minus 1)minus1 sumTt=1(ε

lowastt )

2 is theestimated variance of the best-member error The statistical ensemble members willthen be given by

ytkn = xtk + ω middot εtkn (5)

where the innovations εtkn are drawn at random from a database of past error fore-casts The parameter ω will be larger than one if the best-member method producedunderdispersive ensemble members and smaller than one otherwise

Figure 4(a) shows that the variance produced by the modified best-member methodof WB05 is very close to the variance of the observations departures being likelycaused by sampling uncertainty and roundoff errors Figure 4(b) shows however thatthis improvement in the variance of the dressed ensemble members comes at the costof an increased kurtosis Hence the method proposed by WB05 leads to predictivedistributions which have much heavier tails than the observations which will lead toan overestimation of the probability of extreme events

4 DRESSING AND WEIGHTING EACH MEMBER DIFFERENTLY THE WEIGHTED MEMBERSMETHOD

In the best-member method each ensemble member is dressed using the sameerror distribution This seems to make sense if all ensemble members are exchangeableprior to their observation as there is no a priori reason for assuming that any ensemblemember has any more chance of being the best member of the ensemble or that its

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1357

error distribution should be any different from that of the other members if it indeedhappens to be the best member Recall that a set of random variables u = u1 uKare exchangeable if their joint probability density function p(u) does not depend on theirorder ie if for any permutation uprime of the elements of u we have p(u) = p(uprime) (Lindleyand Novick 1981) In other words the forecasts can be reordered and renumberedwithout losing any information on the predictand However we must not forget that thevalues taken by all ensemble members are known at the time an ensemble member getsdressed with an error distribution Hence member dressing is performed a posterioriwhen the ensemble members are not exchangeable any more

Consider for example an EPS which produces underdispersive dynamical ensem-bles Because they are underdispersive the outcome will often lie outside the spread ofthe ensemble Hence the extrema of the ensemble will have much more chance of beingthe best member of the ensemble than a member which is close to the ensemble meanConversely if the EPS is highly overdispersive members close to the ensemble meanwill have more chance of being the best members of the ensemble than the extrema ofthe ensemble Furthermore if an ensemble member close to a mode of the ensembleempirical distribution happens to be the best member of the ensemble the error madewill tend to be small as it must be smaller than half the distance to the closest ensemblemember otherwise it would not be the best member of the ensemble

Hence the probability that an ensemble member be the best member of theensemble as well as the error distribution of the best member of the ensemble can bothdepend on the location of the ensemble member within the ensemble For multivariateforecasts this location can be measured for example by the distance to the ensemblemean using the norm selected to identify the best member of the ensemble Forunivariate forecasts which are the focus of this paper a simpler solution exists wecan simply sort the ensemble members and take into account the rank of a member inthe sorted ensemble when dressing it with an error distribution

(a) Dressing each ensemble member with a different kernelGoing back to the simulations performed with the synthetic EPS system we can

sort the dynamical ensemble forecasts and estimate for ranked ensemble members theprobability of being the best member of the ensemble as well as the mean and varianceof the error distribution Define xt(k) to be the kth order statistic of an ensemble Xt =xtk k = 1 2 K and εlowast

(k)= yt minus xlowast

t | xlowastt = xt(k) t = 1 2 T to be the

best-member errors observed in the database of past forecasts when the best memberwas the kth order statistic Define also pk to be the probability that the best member isxt(k) ie pk = Pr[xlowast

t = xt(k)]Figure 5 shows pk as a function of k for K = 20 ensemble members for μa = 03

(an underdispersive ensemble) and for μa = 17 (an overdispersive ensemble) It canbe seen that pk depends on k as well as on μa as predicted the lowest and highestensemble members have higher (lower) probability of being the best member of anunderdispersive (overdispersive) ensemble Figure 6 presents the mean and standarddeviation of εlowast

(k) again for μa = 03 and μa = 17 It can be seen that the error

distribution is biased when the best member is one of the lowest or one of the highestmembers of the ensemble and that the standard deviation of the error is larger in thesecases Hence it does not make a lot of sense to dress each ensemble member with thesame error distribution

When dressing the kth member in the ordered ensemble we therefore propose thatinstead of resampling from the archive of all best-member errors we instead resample

1358 V FORTIN et al

0 5 10 15 200

005

01

015

02P

roba

bilit

y

k0 5 10 15 20

0

002

004

006

008

01

k

Pro

bab

ility

(a ) a=03 (b) a=17

Figure 5 Probability that the member of rank k is the best member of the ordered ensemble for (a) μa = 03and (b) μa = 17

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

Mean Standard deviation

(a) a=03 (b) a=17

Figure 6 Mean and standard deviation of the best-member error distribution as a function of the rank k of themember in the ordered ensemble for (a) μa = 03 and (b) μa = 17

from εlowast(k)

to obtain dressed ensemble members

ytkn = xt(k) + ω middot εt(k)n (6)

where εt(k)n is drawn at random from εlowast(k)

(b) Giving a different weight to each dynamical ensemble memberThe number of dressed members generated from the kth order statistic should also

reflect the probability that this particular member is the best member of the ensemble Ifwe want to obtain M ensemble members from the original K-member ensemble thena possibility would be to draw Nk = pk middot M dressed ensemble members from xt(k)However as we are still not drawing the statistical members Yt from the conditionaldistribution p(yt | Xt ) we would not be guaranteed to obtain statistical members whichdisplay the desired amount of variance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1359

An alternative is to optimize the number of dressed ensemble members drawn fromeach dynamical member so as to get the correct variance Let wk be the proportion ofthe statistical members that are generated from the kth order statistic of each dynamicalensemble If we build an archive by mixing statistical ensemble members from differentforecasts each statistical ensemble member y will have been obtained as the sum ofthe kth order statistic of a dynamical ensemble xt(k) and an independent error termεt(k)n resampled from εlowast

(k) Let μ(x(k)) and σ 2(x(k)) be respectively the mean and

variance of xt(k) and μ(ε(k)) and σ 2(ε(k)) be the mean and variance of εt(k)n Denoteby y(k) a statistical ensemble member which has been drawn as the sum of a kth orderstatistic and an error term It follows that the mean and variance of y(k) are respectivelygiven by μ(y(k)) = μ(x(k)) + μ(ε(k)) and σ 2(y(k)) = σ 2(x(k)) + σ 2(ε(k)) If the numberof forecasts in the archive is large any given ensemble member y can then be consideredto have been drawn from a mixture of K distributions p(y(k)) where the probability ofeach distribution corresponds to the proportion of members in the archive which havebeen drawn from the kth order statistic of a dynamical ensemble wk

p(y ) =K

sum

k=1

wk middot p(y(k)) (7)

It can be shown that the variance σ 2(y ) of this finite mixture is given by

σ 2(y ) =K

sum

k=1

wk(σ2(x(k)) + σ 2(ε(k)) + (μ(x(k)) + μ(ε(k)))

2)

minus( K

sum

k=1

wk(μ(x(k)) + μ(ε(k)))

)2

(8)

Given estimates of μ(x(k)) μ(ε(k)) σ 2(x(k)) and σ 2(ε(k)) respectively denoted byx(k) ε(k) s2

x(k)and s2

ε(k) we can hence obtain an estimate s2

y of σ 2(y ) for a given setof weights wk In order to have statistical ensemble members which have on averagethe correct variance we would like to find weights wk for which s2

y = s2y the observed

variance of the observations on the same period

s2y =

Ksum

k=1

wk(s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2) minus

( Ksum

k=1

wk(x(k) + ε(k))

)2

(9)

There can obviously be multiple solutions to this equation or no solution for thatmatter We propose to constrain the problem further by taking into account the observedshape of pk the probability that the best member is the kth order statistic as a functionof k For underdispersive dynamical ensembles the function pk is U-shaped whereasit is bell-shaped for overdispersive ensembles A simple parametric function which canbe used to approximate pk is the beta probability density function We use here theparametrization proposed by Walley (1996)

fB(x ω τ ) = xωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1

B(ω middot τ ω minus ω middot τ ) ω gt 0 0 τ 1 (10)

where B(α β) = int 10 xαminus1(1 minus x)βminus1 dx is the beta function (Weisstein 2002) While

being defined by only two parameters it can be either symmetric (when τ = 05)

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1357

error distribution should be any different from that of the other members if it indeedhappens to be the best member Recall that a set of random variables u = u1 uKare exchangeable if their joint probability density function p(u) does not depend on theirorder ie if for any permutation uprime of the elements of u we have p(u) = p(uprime) (Lindleyand Novick 1981) In other words the forecasts can be reordered and renumberedwithout losing any information on the predictand However we must not forget that thevalues taken by all ensemble members are known at the time an ensemble member getsdressed with an error distribution Hence member dressing is performed a posterioriwhen the ensemble members are not exchangeable any more

Consider for example an EPS which produces underdispersive dynamical ensem-bles Because they are underdispersive the outcome will often lie outside the spread ofthe ensemble Hence the extrema of the ensemble will have much more chance of beingthe best member of the ensemble than a member which is close to the ensemble meanConversely if the EPS is highly overdispersive members close to the ensemble meanwill have more chance of being the best members of the ensemble than the extrema ofthe ensemble Furthermore if an ensemble member close to a mode of the ensembleempirical distribution happens to be the best member of the ensemble the error madewill tend to be small as it must be smaller than half the distance to the closest ensemblemember otherwise it would not be the best member of the ensemble

Hence the probability that an ensemble member be the best member of theensemble as well as the error distribution of the best member of the ensemble can bothdepend on the location of the ensemble member within the ensemble For multivariateforecasts this location can be measured for example by the distance to the ensemblemean using the norm selected to identify the best member of the ensemble Forunivariate forecasts which are the focus of this paper a simpler solution exists wecan simply sort the ensemble members and take into account the rank of a member inthe sorted ensemble when dressing it with an error distribution

(a) Dressing each ensemble member with a different kernelGoing back to the simulations performed with the synthetic EPS system we can

sort the dynamical ensemble forecasts and estimate for ranked ensemble members theprobability of being the best member of the ensemble as well as the mean and varianceof the error distribution Define xt(k) to be the kth order statistic of an ensemble Xt =xtk k = 1 2 K and εlowast

(k)= yt minus xlowast

t | xlowastt = xt(k) t = 1 2 T to be the

best-member errors observed in the database of past forecasts when the best memberwas the kth order statistic Define also pk to be the probability that the best member isxt(k) ie pk = Pr[xlowast

t = xt(k)]Figure 5 shows pk as a function of k for K = 20 ensemble members for μa = 03

(an underdispersive ensemble) and for μa = 17 (an overdispersive ensemble) It canbe seen that pk depends on k as well as on μa as predicted the lowest and highestensemble members have higher (lower) probability of being the best member of anunderdispersive (overdispersive) ensemble Figure 6 presents the mean and standarddeviation of εlowast

(k) again for μa = 03 and μa = 17 It can be seen that the error

distribution is biased when the best member is one of the lowest or one of the highestmembers of the ensemble and that the standard deviation of the error is larger in thesecases Hence it does not make a lot of sense to dress each ensemble member with thesame error distribution

When dressing the kth member in the ordered ensemble we therefore propose thatinstead of resampling from the archive of all best-member errors we instead resample

1358 V FORTIN et al

0 5 10 15 200

005

01

015

02P

roba

bilit

y

k0 5 10 15 20

0

002

004

006

008

01

k

Pro

bab

ility

(a ) a=03 (b) a=17

Figure 5 Probability that the member of rank k is the best member of the ordered ensemble for (a) μa = 03and (b) μa = 17

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

Mean Standard deviation

(a) a=03 (b) a=17

Figure 6 Mean and standard deviation of the best-member error distribution as a function of the rank k of themember in the ordered ensemble for (a) μa = 03 and (b) μa = 17

from εlowast(k)

to obtain dressed ensemble members

ytkn = xt(k) + ω middot εt(k)n (6)

where εt(k)n is drawn at random from εlowast(k)

(b) Giving a different weight to each dynamical ensemble memberThe number of dressed members generated from the kth order statistic should also

reflect the probability that this particular member is the best member of the ensemble Ifwe want to obtain M ensemble members from the original K-member ensemble thena possibility would be to draw Nk = pk middot M dressed ensemble members from xt(k)However as we are still not drawing the statistical members Yt from the conditionaldistribution p(yt | Xt ) we would not be guaranteed to obtain statistical members whichdisplay the desired amount of variance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1359

An alternative is to optimize the number of dressed ensemble members drawn fromeach dynamical member so as to get the correct variance Let wk be the proportion ofthe statistical members that are generated from the kth order statistic of each dynamicalensemble If we build an archive by mixing statistical ensemble members from differentforecasts each statistical ensemble member y will have been obtained as the sum ofthe kth order statistic of a dynamical ensemble xt(k) and an independent error termεt(k)n resampled from εlowast

(k) Let μ(x(k)) and σ 2(x(k)) be respectively the mean and

variance of xt(k) and μ(ε(k)) and σ 2(ε(k)) be the mean and variance of εt(k)n Denoteby y(k) a statistical ensemble member which has been drawn as the sum of a kth orderstatistic and an error term It follows that the mean and variance of y(k) are respectivelygiven by μ(y(k)) = μ(x(k)) + μ(ε(k)) and σ 2(y(k)) = σ 2(x(k)) + σ 2(ε(k)) If the numberof forecasts in the archive is large any given ensemble member y can then be consideredto have been drawn from a mixture of K distributions p(y(k)) where the probability ofeach distribution corresponds to the proportion of members in the archive which havebeen drawn from the kth order statistic of a dynamical ensemble wk

p(y ) =K

sum

k=1

wk middot p(y(k)) (7)

It can be shown that the variance σ 2(y ) of this finite mixture is given by

σ 2(y ) =K

sum

k=1

wk(σ2(x(k)) + σ 2(ε(k)) + (μ(x(k)) + μ(ε(k)))

2)

minus( K

sum

k=1

wk(μ(x(k)) + μ(ε(k)))

)2

(8)

Given estimates of μ(x(k)) μ(ε(k)) σ 2(x(k)) and σ 2(ε(k)) respectively denoted byx(k) ε(k) s2

x(k)and s2

ε(k) we can hence obtain an estimate s2

y of σ 2(y ) for a given setof weights wk In order to have statistical ensemble members which have on averagethe correct variance we would like to find weights wk for which s2

y = s2y the observed

variance of the observations on the same period

s2y =

Ksum

k=1

wk(s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2) minus

( Ksum

k=1

wk(x(k) + ε(k))

)2

(9)

There can obviously be multiple solutions to this equation or no solution for thatmatter We propose to constrain the problem further by taking into account the observedshape of pk the probability that the best member is the kth order statistic as a functionof k For underdispersive dynamical ensembles the function pk is U-shaped whereasit is bell-shaped for overdispersive ensembles A simple parametric function which canbe used to approximate pk is the beta probability density function We use here theparametrization proposed by Walley (1996)

fB(x ω τ ) = xωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1

B(ω middot τ ω minus ω middot τ ) ω gt 0 0 τ 1 (10)

where B(α β) = int 10 xαminus1(1 minus x)βminus1 dx is the beta function (Weisstein 2002) While

being defined by only two parameters it can be either symmetric (when τ = 05)

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

1358 V FORTIN et al

0 5 10 15 200

005

01

015

02P

roba

bilit

y

k0 5 10 15 20

0

002

004

006

008

01

k

Pro

bab

ility

(a ) a=03 (b) a=17

Figure 5 Probability that the member of rank k is the best member of the ordered ensemble for (a) μa = 03and (b) μa = 17

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

0 5 10 15 20-1

-08

-06

-04

-02

0

02

04

06

08

1

k

Err

or

Mean Standard deviation

(a) a=03 (b) a=17

Figure 6 Mean and standard deviation of the best-member error distribution as a function of the rank k of themember in the ordered ensemble for (a) μa = 03 and (b) μa = 17

from εlowast(k)

to obtain dressed ensemble members

ytkn = xt(k) + ω middot εt(k)n (6)

where εt(k)n is drawn at random from εlowast(k)

(b) Giving a different weight to each dynamical ensemble memberThe number of dressed members generated from the kth order statistic should also

reflect the probability that this particular member is the best member of the ensemble Ifwe want to obtain M ensemble members from the original K-member ensemble thena possibility would be to draw Nk = pk middot M dressed ensemble members from xt(k)However as we are still not drawing the statistical members Yt from the conditionaldistribution p(yt | Xt ) we would not be guaranteed to obtain statistical members whichdisplay the desired amount of variance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1359

An alternative is to optimize the number of dressed ensemble members drawn fromeach dynamical member so as to get the correct variance Let wk be the proportion ofthe statistical members that are generated from the kth order statistic of each dynamicalensemble If we build an archive by mixing statistical ensemble members from differentforecasts each statistical ensemble member y will have been obtained as the sum ofthe kth order statistic of a dynamical ensemble xt(k) and an independent error termεt(k)n resampled from εlowast

(k) Let μ(x(k)) and σ 2(x(k)) be respectively the mean and

variance of xt(k) and μ(ε(k)) and σ 2(ε(k)) be the mean and variance of εt(k)n Denoteby y(k) a statistical ensemble member which has been drawn as the sum of a kth orderstatistic and an error term It follows that the mean and variance of y(k) are respectivelygiven by μ(y(k)) = μ(x(k)) + μ(ε(k)) and σ 2(y(k)) = σ 2(x(k)) + σ 2(ε(k)) If the numberof forecasts in the archive is large any given ensemble member y can then be consideredto have been drawn from a mixture of K distributions p(y(k)) where the probability ofeach distribution corresponds to the proportion of members in the archive which havebeen drawn from the kth order statistic of a dynamical ensemble wk

p(y ) =K

sum

k=1

wk middot p(y(k)) (7)

It can be shown that the variance σ 2(y ) of this finite mixture is given by

σ 2(y ) =K

sum

k=1

wk(σ2(x(k)) + σ 2(ε(k)) + (μ(x(k)) + μ(ε(k)))

2)

minus( K

sum

k=1

wk(μ(x(k)) + μ(ε(k)))

)2

(8)

Given estimates of μ(x(k)) μ(ε(k)) σ 2(x(k)) and σ 2(ε(k)) respectively denoted byx(k) ε(k) s2

x(k)and s2

ε(k) we can hence obtain an estimate s2

y of σ 2(y ) for a given setof weights wk In order to have statistical ensemble members which have on averagethe correct variance we would like to find weights wk for which s2

y = s2y the observed

variance of the observations on the same period

s2y =

Ksum

k=1

wk(s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2) minus

( Ksum

k=1

wk(x(k) + ε(k))

)2

(9)

There can obviously be multiple solutions to this equation or no solution for thatmatter We propose to constrain the problem further by taking into account the observedshape of pk the probability that the best member is the kth order statistic as a functionof k For underdispersive dynamical ensembles the function pk is U-shaped whereasit is bell-shaped for overdispersive ensembles A simple parametric function which canbe used to approximate pk is the beta probability density function We use here theparametrization proposed by Walley (1996)

fB(x ω τ ) = xωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1

B(ω middot τ ω minus ω middot τ ) ω gt 0 0 τ 1 (10)

where B(α β) = int 10 xαminus1(1 minus x)βminus1 dx is the beta function (Weisstein 2002) While

being defined by only two parameters it can be either symmetric (when τ = 05)

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1359

An alternative is to optimize the number of dressed ensemble members drawn fromeach dynamical member so as to get the correct variance Let wk be the proportion ofthe statistical members that are generated from the kth order statistic of each dynamicalensemble If we build an archive by mixing statistical ensemble members from differentforecasts each statistical ensemble member y will have been obtained as the sum ofthe kth order statistic of a dynamical ensemble xt(k) and an independent error termεt(k)n resampled from εlowast

(k) Let μ(x(k)) and σ 2(x(k)) be respectively the mean and

variance of xt(k) and μ(ε(k)) and σ 2(ε(k)) be the mean and variance of εt(k)n Denoteby y(k) a statistical ensemble member which has been drawn as the sum of a kth orderstatistic and an error term It follows that the mean and variance of y(k) are respectivelygiven by μ(y(k)) = μ(x(k)) + μ(ε(k)) and σ 2(y(k)) = σ 2(x(k)) + σ 2(ε(k)) If the numberof forecasts in the archive is large any given ensemble member y can then be consideredto have been drawn from a mixture of K distributions p(y(k)) where the probability ofeach distribution corresponds to the proportion of members in the archive which havebeen drawn from the kth order statistic of a dynamical ensemble wk

p(y ) =K

sum

k=1

wk middot p(y(k)) (7)

It can be shown that the variance σ 2(y ) of this finite mixture is given by

σ 2(y ) =K

sum

k=1

wk(σ2(x(k)) + σ 2(ε(k)) + (μ(x(k)) + μ(ε(k)))

2)

minus( K

sum

k=1

wk(μ(x(k)) + μ(ε(k)))

)2

(8)

Given estimates of μ(x(k)) μ(ε(k)) σ 2(x(k)) and σ 2(ε(k)) respectively denoted byx(k) ε(k) s2

x(k)and s2

ε(k) we can hence obtain an estimate s2

y of σ 2(y ) for a given setof weights wk In order to have statistical ensemble members which have on averagethe correct variance we would like to find weights wk for which s2

y = s2y the observed

variance of the observations on the same period

s2y =

Ksum

k=1

wk(s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2) minus

( Ksum

k=1

wk(x(k) + ε(k))

)2

(9)

There can obviously be multiple solutions to this equation or no solution for thatmatter We propose to constrain the problem further by taking into account the observedshape of pk the probability that the best member is the kth order statistic as a functionof k For underdispersive dynamical ensembles the function pk is U-shaped whereasit is bell-shaped for overdispersive ensembles A simple parametric function which canbe used to approximate pk is the beta probability density function We use here theparametrization proposed by Walley (1996)

fB(x ω τ ) = xωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1

B(ω middot τ ω minus ω middot τ ) ω gt 0 0 τ 1 (10)

where B(α β) = int 10 xαminus1(1 minus x)βminus1 dx is the beta function (Weisstein 2002) While

being defined by only two parameters it can be either symmetric (when τ = 05)

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

1360 V FORTIN et al

skewed to the left (when τ lt 05) or to the right (when τ gt 05) be bell-shaped (whenω gt 2) or U-shaped (when ω lt 2) The beta probability density function is howeverdefined on the interval [0 1] and we are of course looking for a probability densityfunction defined on the set 1 2 K We therefore suggest restricting the weightswk to being proportional to the integral of a beta probability density function between(k minus 1)K and kK

wk(ω τ ) =int kK

(kminus1)Kxωmiddotτminus1 middot (1 minus x)ωminusωmiddotτminus1 dx

B(ω middot τ ω minus ω middot τ ) (11)

The expectation of the beta distribution being given by τ we propose to furthermoreconstrain the weight function by choosing τ equal to the (rescaled) expectation of pk

τ =K

sum

k=1

int kK

(kminus1)K

k middot pk dx = 1

K

Ksum

k=1

k middot pk minus 1

2K (12)

In this way the parameter τ can account for the asymmetry of the probabilities pk Note that for the synthetic EPS the probabilities pk are symmetric about K2 so thatτ = 05 Having selected a value for τ we can then find the value of the parameter ω

which minimizes the difference between s2y and s2

y

ω = arg minω

Ksum

k=1

wk(ω τ ) middot (s2x(k)

+ s2ε(k)

+ (x(k) + ε(k))2)

minus( K

sum

k=1

wk(ω τ ) middot (x(k) + ε(k))

)2

minus s2y

(13)

(c) The weighted members method applied to the synthetic EPS systemWe have applied the method proposed in this section to the synthetic EPS system

of WB05 Figure 7(a) shows that the variance of the statistical ensemble obtained issimilar to the variance obtained using the method of WB05 and that it works as well foroverdispersive dynamical ensembles Compared with Fig 4(b) Fig 7(b) shows a majorimprovement for the kurtosis of the statistical ensemble members while the differencebetween the kurtosis of the forecasts and the kurtosis of the observations is still alwayspositive its magnitude is much smaller than for the WB05 method which means thatthe probability of extreme events is better estimated

For example the kurtosis of the ensemble forecasts obtained using the method ofWB05 for μa = 03 and K = 20 is β2 = 103 To illustrate the fact that the methodof WB05 leads to an overestimation of extreme events in this case we have plottedon normal probability paper the empirical distribution of the verifications of the first100 ensemble forecasts (each ensemble consisting of M = N middot K = 3000 members)and of the first 100 ensemble forecasts obtained using the new method proposed above(cf Fig 8) While the mean and variance of these three samples are almost identicalthe distribution of the forecasts obtained using the method of WB05 clearly has muchheavier tails than the observations whereas the proposed method leads to forecastshaving a stationary distribution much more difficult to distinguish from the stationarydistribution of the observations

It can be noted on Fig 7(a) (look in the lower right corner) that the proposed methodis not able to produce quite the right amount of variance for highly overdispersive

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1361

05 1 15

4

6

8

10

12

14

16

18

20

11

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

a

K(a) var ens var obs

05 1 15

4

6

8

10

12

14

16

18

20

1

1

1

1

1

1 1

11

1

1

00

00

0

0

a

K

(b) 2 ens - 2 obs

Figure 7 (a) Ratio of the variance of the statistical ensemble members generated by weighted members methodproposed in this paper and (b) difference between the kurtosis of these ensemble members and the kurtosis ofthe observations as a function of the expected amount of underdispersion (μa ) and of the size of the dynamical

ensemble (K)

15 10 5 0 5 10 15

00010003001 002 005 010

025

050

075

090 095 098 099

09970999

Data

Pro

babi

lity

Normal Probability Plot

Wang and Bishop (2005)Weighted members methodObservations

Figure 8 A comparison of the stationary distribution of the observations (+) of ensemble members obtainedusing the method of WB05 (thick line) and using the weighted members method proposed in this paper (scored

line) for μa = 03 and K = 20

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

1362 V FORTIN et al

22

2

a

K

11

1

3

3

33

4

4

4

4

5

5

5

10

10

100

05 1 15

4

6

8

10

12

14

16

18

20

Figure 9 Value of the parameter ω of the beta probability density function optimized to calibrate the varianceof the synthetic ensemble members

dynamical ensembles with very few members This is better understood if we lookat Fig 9 which shows the value of the parameter ω of the beta probability densityfunction needed in order to calibrate the variance of the synthetic ensemble membersusing Eq (13) It can be seen that for highly overdispersive dynamical ensembles havingvery few members values over 100 were obtained which in practice means that onlythe median of the ensemble is given any weight and that the variance cannot be furtherreduced by the proposed method Of course one could then resort to reducing thevariance of the innovations which are added to the dynamical ensemble members but itwould certainly make more sense to increase the number of dynamical members in theensemble Note also on Fig 9 that only in the upper-left region is ω lt 2 meaning thatmore weight is then given to extreme members of the ensemble This corresponds to theregion for which the original best-member method was underdispersive (cf Fig 2)

(d) Limitations of nonparametric predictive simulationDrawing from an archive of past errors has the limitation that the size of the

dressed ensemble will be limited by the size of the archive While this is true for allnonparametric probability dressing methods presented in this paper it is more a problemfor the weighted members method Indeed with the proposed method we need a separatearchive for each order statistic of the ensemble If the number of ensemble members islarge compared to the size of the training dataset there can even be no example in thecalibration period of one of the order statistics being the best member of the ensembleIn practice we would like the product Nk = pk middot M to be smaller than the size of thearchive εlowast

(k) Furthermore we would also want the size of the archive to be sufficiently

large so that its mean and variance can be accurately estimated as this is needed tocalibrate the variance of the statistical ensemble using Eq (13) This is not a problemfor the synthetic EPS presented in this section given the size of the archive but it mightprove to be an important limitation when the calibration period is short In the nextsection we address this issue using data from an operational EPS

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1363

Montreacuteal

St-Lawrence River

Chacircteauguay

river basin

Canada

USA

Ottawa River

GFS

grid point

Figure 10 Location of the Chateauguay river basin

5 PROBABILISTIC FORECASTING OF PRECIPITATION BASED ON GFS ENSEMBLEFORECASTS

Precipitation is one of the most difficult meteorological variables to forecast yetforecasting precipitation is a prerequisite for medium-range hydrological forecasting(Collier and Krzysztofowicz 2000) Temperature forecasts are also useful but mainlyfor snow-melt events In this section we use data from a small watershed in Canada toshow that probability dressing methods and in particular the weighted members methodcan make it possible to take advantage of a low resolution EPS to obtain reliable andskilful precipitation forecasts

(a) Description of the datasetThe Chateauguay river basin which will be used for this experiment is located on

the CanadandashUS border (Fig 10) The basin area is 2543 km2 the mean annual precipi-tation is about 1000 mm and the probability of observing a daily mean precipitation ofmore than 02 mm is about 50 We will forecast daily precipitation on this basin usingthe output from a 15-member ensemble from the National Centers for EnvironmentalPrediction (NCEP) Global Forecast System (GFS) model running at a spatial resolutionof 25 A reforecast experiment was performed using this model and hence past fore-casts are available for this model from 1979 to the present (Hamill et al 2006) For thepurpose of this experiment we obtained daily temperature and precipitation forecastsfrom 1979 to 1990 with lead times of one to ten days for the grid point closest to thebasin located at 45N and 75W (cf Fig 10) Observations of daily mean areal rain-fall snowfall minimum and maximum temperature were provided by Robert Leconte(personal communication) From this database we estimated daily total precipitation bysumming rainfall and snowfall and we estimated daily mean temperature by averagingminimum and maximum temperature

It was observed that for days 1 to 8 the variance of the temperature forecasts isabout 5 higher than the variance of the observations making it impossible to use the

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

1364 V FORTIN et al

methods of RS03 and WB05 to calibrate the forecasts On the other hand the EPSunderestimates the variability of the precipitation for all lead times making it possibleto compare the proposed method with the probability dressing methods of RS03 andWB05 It is also interesting to focus on precipitation to see how the methods behavewhen forecasting a random variable that is markedly non-normal

(b) The continuous ranked probability scoreWhile it is interesting to compare the three methods in terms of reliability it is also

interesting to evaluate the sharpness of the forecasts The continuous ranked probabilityscore (CRPS) is a useful measure of both sharpness and reliability of a probabilisticforecast (Hersbach 2000)

CRPS(F y) =int infin

minusinfin(F (x) minus H(x minus y))2 dx (14)

where F(x) is the cumulative distribution function of the forecast y is the observationand H(x minus y) is the Heaviside step function ie H(x) = 0 for x lt 0 and H(x) = 1 forx 0 The CRPS measures the departure of the probabilistic forecast from a perfectprobabilistic forecast which the Heaviside function represents Note that with this scorea smaller value means a better forecast Note also that the CRPS can be seen as ageneralization of mean absolute error to probabilistic forecasting and that consequentlyit has the same units as the predictand and the forecasts In the case of ensembleforecasts when the size of the sample is sufficiently large F(x) can be approximated bythe empirical cumulative distribution function of the ensemble To score a forecastingsystem one can then compute the average of the CRPS score over a large number ofevents

(c) Experimental set-up and resultsUsing the years 1979 and 1980 to calibrate the statistical methods we generated

(on average for our method) 15 statistical members from each dynamical memberthus obtaining a statistical ensemble of size M = 225 and computed the CRPS scoreover the 1985ndash1990 period for each lead time and each method To see whether thestatistical methods improved upon the dynamical EPS and whether they showed someskill we also computed the CRPS for the 15 dynamical members of the EPS and forthe climatology Then to study the effect of the length of the calibration period on theresults we performed the same analysis using the period 1979ndash1984 for calibration

Even if the predictand is univariate some care must be taken when identifying thebest member of each ensemble because the EPS forecasts can be biased Clearly if weidentify the best member by minimizing the absolute difference between the observationand the forecasts then the result will depend on the bias of the forecasts The simplestsolution is to centre the observations and forecasts prior to identifying the best memberof each ensemble Then the best member can be identified using the absolute differenceto measure the distance between the forecasts and the observation Finally unbiasedforecasts can be obtained by adding to the statistical members the precipitation meanobserved during the calibration period

By design all probability dressing methods presented in this paper can producenegative forecasts of precipitation For this experiment we chose to simply replace withzero any negative forecast In terms of the CRPS this is equivalent to using a lowerbound of zero instead of minusinfin in Eq (14) Hence ignoring negative forecasts can onlyimprove the score of all probability dressing methods

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1365

0 2 4 6 8 102

21

22

23

24

25

(a) Jan 1979 - Dec 1980

lead time (day)

CR

PS

(m

m)

0 2 4 6 8 102

21

22

23

24

25

(b) Jan 1979 - Dec 1984

lead time (day)C

RP

S (

mm

)

Climatology EPS RS03 WB05 Weighted members method

Figure 11 CRPS of four ensemble prediction systems as a function of lead time for calibration periods of (a) twoand (b) six years

Figure 11 presents the results obtained for each calibration period It can be seenthat although the methods of RS03 and WB05 always score better than the dynamicalEPS which shows no skill for all lead times the method of RS03 only outperformsclimatology for day one and the method of WB05 only has skill out to day two Themethod proposed in this paper on the other hand has a lower CRPS than climatologyfrom day one to day six reaching on day six the score obtained by the method of WB05on day two Furthermore while the methods of RS03 and WB05 fail to improve uponthe dynamical EPS after day six the weighted members method still does and its CRPSis still close to climatology which means that the statistical members obtained using themethod proposed in this paper could potentially be used out to day ten Note that theresults are quite similar for both calibration periods meaning that the method proposedin this paper can apparently be used even when the calibration period is relatively shortto obtain skilful probabilistic forecasts of precipitation from a dynamical EPS Howeverwhen we use only two years to produce a statistical ensemble of size M = 225 the ratioof Nk to the size of the archive is larger than one for some order statistics meaningthat identical statistical members are being generated by the sampling procedure Thisratio goes down substantially when the method is calibrated on six years of data stayingwell below one for the first six days of forecast ie when the method shows some skillHence while a short calibration period might be sufficient to obtain skilful probabilisticforecasts using the proposed method a longer calibration period will lead to morediversified scenarios which might be necessary in some applications to better definethe risk of an event Note that we could not calibrate the method on a single year ofdata because the size of the archive εlowast

(k)was zero for some order statistics whereas

the method of WB05 could have been used in this case This illustrates that the datarequirements are indeed more important for the weighted members method than for themethods of RS03 and WB05 They are however not prohibitive since we obtained goodresults when using two years instead of one for calibration

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

1366 V FORTIN et al

6 DISCUSSION

The method proposed in this paper for dressing dynamical ensemble members is notentirely satisfactory as the statistical members are not drawn from the distribution of thepredictand conditional on the EPS outputs However the proposed method is applicableto a wider class of problems than the method of WB05 which inspired it and clearlyleads to better forecasts for the synthetic EPS used by WB05 to illustrate their methodand for the precipitation forecasts over the Chateauguay basin obtained using the GFSEPS reforecast archive On the other hand it requires a larger database of past forecaststhan the method of WB05

(a) Robustness of the statistical ensemble prediction systemWith the proposed method to generate a large number of statistical ensemble

members from a smaller number of dynamical ensemble members more statisticalmembers are drawn for some dynamical ensemble members depending on their rankin the dynamical ensemble and on the value of the parameter ω of the beta distributionWhen ω lt 2 more weight is given to extreme members of the ensemble which can leadto less robust inferences in the presence of outliers Indeed if the dynamical ensemblecan sometimes be contaminated with ensemble members which are not drawn by thesame process as the others (dynamicals models being far from perfect can sometimesproduce unrealistic scenarios) then it might not be a good idea to give extra weight toextreme members of the ensemble Hence while we may never obtain a dynamical EPSwhich displays exactly the right amount of variability for all variables we would preferto work with an EPS which leads to a value of ω larger than 2 as in this case lessweight will be given to extreme members of the ensemble This criterion could becomea useful tool to determine if the spread in a dynamical EPS is sufficient for the purposeof robust probabilistic forecasting If ω is smaller than two it might be preferable totry to increase the spread in the dynamical EPS before using it to produce probabilisticforecasts through statistical post-processing Note that the EPS system does not have tobe overdispersive for ω to be larger than two (cf Fig 9 and recall that if μa lt 1 theEPS system is underdispersive)

(b) Multidimensional forecastsThe method proposed in this paper has only been applied to the problem of

forecasting a scalar variable Clearly in many cases the predictand is multidimensionalHowever in some of these cases this is because the variable of interest is really acombination of different meteorological elements at different spatial locations and leadtimes This is the case for example for hydrological forecasting where the inputs arecertainly multidimensional but the output is very often unidimensional In such caseswe suggest that the weights be chosen so that the forecast of the variable of interestis reliable eg streamflow at the outlet of a watershed When it is really necessaryto provide reliable multivariate forecasts it might be necessary to have more degreesof freedom when selecting the weights It then also becomes impossible to rank theensemble members so that the weight and dressing kernel for each member must bebased on some other measure

For multidimensional forecasts but also when the predictand is a scalar it ispossible to identify extreme members of the ensemble by the distance to the ensemblemean where the distance is measured with respect to the norm used to identify thebest member of the ensemble Ensemble members can then be dressed and weightedaccording to this distance RS03 and WB05 both illustrated the fact that the performance

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1367

of the best-member method is highly dependent on the choice of this norm We mayexpect the same with the proposed method It is also interesting to condition the dressingkernel on the distance to neighbouring members as the error distribution of a givenmember if it happens to be the best member of the ensemble is bounded in somedirections Indeed the error of the best member of the ensemble must lie within theVoronoi cell (Weisstein 2002) of the ensemble members otherwise it would not be thebest member of the ensemble

Adapting the proposed method to multidimensional problems is certainly a chal-lenge A solution is to use the weighted members method to model the marginal dis-tributions of the predictands and then use either a post-processing method such asSchaake Shuffle (Clark et al 2004) to recover the spacendashtime covariability betweenthe predictands or model the covariation structure of the predictands independently oftheir marginal distributions using copulas (Genest and Favre 2006)

(c) A comparison with Bayesian model averagingThe predictive distribution from which we propose to draw statistical ensemble

members is a mixture of shifted error distributions where the shifts are given by theorder statistics of the dynamical ensemble members This is very similar to Bayesianmodel averaging (BMA Raftery et al 2005)

The BMA method considers each dynamic ensemble member output xtk as thelocation parameter of a statistical model Mk and then assumes that the distributionof the predictand conditional on model Mk and on the ensemble forecasts p(yt |Xt Mk) = f (yt minus xtk | θk) depends only on xtk and on a set of parameters θk Theestimation algorithm then allocates each past observation yt to a model Mk andproceeds to estimate the probability p(Mk) of each model and the parameters θkTypically this is done using the Expectation-Maximization (EM) algorithm (Dempsteret al 1977) which converges to a local maximum of the likelihood function and leads tomaximum likelihood estimators πk and θk respectively for p(Mk) and θk The predictivedistribution is then approximated by a mixture of predictive distributions

p(yt | Xt ) asympK

sum

k=1

πk middot f (yt minus xtk | θk) (15)

If the dynamic ensemble members are exchangeable prior to their observations forexample if the same model is used for all members from initial conditions obtained froma random perturbation of observed initial conditions then the predictive distribution ofyt should be the same for any permutation of the dynamic ensemble members andneither the weights p(Mk) nor the parameters θk can vary with k so that we then have

p(yt | Xt ) asymp 1

K

Ksum

k=1

f (yt minus xtk | θ) (16)

This is exactly the hypothesis underlying the best-member method of RS03 oras modified by WB05 (cf Eq (2)) For EPS with exchangeable members the threemethods only differ in implementation details mainly the choice of the function f andthe estimation method for θ Furthermore parameter estimation by the EM algorithm aswell as by Markov Chain Monte Carlo (MCMC) techniques can fail to provide reliableparameter estimates because of identifiability problems Note that the BMA methodwas essentially proposed by Raftery et al (2005) as a mean of combining outputs from

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

1368 V FORTIN et al

different deterministic dynamical models often referred to as a poor manrsquos ensemblein which case the ensemble members are typically not exchangeable

The method presented in this paper is also closely related to the BMA Indeed itcan be seen as a mixture model where each order statistic of the ensemble is viewed asa different model Mk where the probability of each model is estimated using the best-member approach as the allocation algorithm and where the probability distribution f(k)

of each component of the mixture is estimated nonparametrically from a database ofpast forecasts

p(yt | Xt ) asympK

sum

k=1

πk middot f(k)(yt minus xt(k)) (17)

Note that univariate mixture models of this type are known to be non-identifiablefrom a nonparametric viewpoint when no training data are available (Hall and Zhou2003) meaning that we must rely on an arbitrary allocation algorithm such as thebest-member approach to build a training dataset for each component of the mixtureJoint estimation of the mixing proportions and probability density functions is notfeasible

7 CONCLUSION

Ensemble prediction systems (EPS) can be very useful to estimate the uncertaintyof a deterministic numerical weather prediction However the number of members inan EPS is limited by computing resources and EPS outputs typically do not providereliable probabilistic forecasts We have shown in this paper that while the best-membermethod of RS03 as modified by WB05 can lead to second-order reliable forecasts foran underdispersive EPS it can also lead to probabilistic forecasts having very heavytails and which therefore will overestimate the probability of extreme events We haveproposed and tested on a synthetic EPS and on a real dataset a new method for dressingensemble members which dresses and weights each ensemble member differently Thisnew method has the advantage of working for both underdispersive and overdispersiveensembles and can lead to forecasts that are more reliable and more skilful Using asynthetic EPS we showed that the method can improve tail probabilities as measuredby the coefficient of kurtosis and using precipitation forecasts from the GFS reforecastexperiment we showed that it can lead to more skilful forecasts as measured by thecontinuous ranked probability score We now plan to test the method with output fromthe Canadian ensemble prediction system with application to streamflow forecastingand to adapt the weighted best-members method to the multidimensional case

ACKNOWLEDGEMENTS

This paper was initiated by a discussion with Roman Krzysztofowicz whom wewish to thank for pointing out to us some limitations of the best-member methodComments from the reviewers and from Hugues Masse also resulted in significantimprovements to the manuscript We also wish to thank Noel Evora for providing thedata from the GFS reforecast experiment and Robert Leconte for providing the dailymean areal precipitation and temperature for the Chateauguay watershed

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress

PROBABILISTIC FORECASTING FROM ENSEMBLE PREDICTION SYSTEMS 1369

REFERENCES

Buizza R Houtekamer P LToth Z Pellerin G Wei Mand Zhu Y

2005 A comparison of the ECMWF MSC and NCEP global ensembleprediction systems Mon Weather Rev 133 1076ndash1097

Clark M Gangopadhyay SHay L Rajagopalan B andWilby R

2004 The Schaake Shuffle A method for reconstructing spacendashtimevariability in forecasted precipitation and temperature fieldsJ Hydrometeorology 5 243ndash262

Collier C G andKrzysztofowicz R

2000 Quantitative precipitation forecasting J Hydrol 239 1ndash2

Davis D R Duckstein L andKrzysztofowicz R

1979 The worth of hydrologic data for nonoptimal decision makingWater Resour Res 15 1733ndash1742

Dempster A P Laird N M andRubin D B

1977 Maximum likelihood from incomplete data via the EM algorithmJ R Stat Soc Series B 39 1ndash38

Eckel F A 2003 lsquoEffective mesoscale short-range ensemble forecastingrsquo PhDthesis University of Washington

Feddersen H and Andersen U 2005 A method for statistical downscaling of seasonal ensemblepredictions Tellus 57A 398ndash408

Gander W and Gautschi W 2000 Adaptive quadraturemdashrevisited BIT Numerical Mathematics 4084ndash101

Genest C and Favre A-C 2006 Everything you always wanted to know about copula modelingbut were afraid to ask J Hydrol Eng in press

Hall P and Zhou X-H 2003 Nonparametric estimation of component distributions in amultivariate mixture Ann Statistics 31 201ndash224

Hamill T M Whitaker J S andMullen S L

2006 Reforecasts An important dataset for improving weatherpredictions Bull Am Meteorol Soc 87 33ndash46

Hersbach H 2000 Decomposition of the continuous ranked probability score forensemble prediction systems Weather and Forecasting 15559ndash570

Houdant B 2004 lsquoContribution a lrsquoamelioration de la prevision hydro-meteorologique operationnelle Pour lrsquousage des probabilitesdans la communication entre acteursrsquo PhD thesis EcoleNationale du Genie Rural des Eaux et des Forets (Paris)

Jensen F V 2001 Bayesian networks and decision graphs Springer-Verlag NewYork

Krzysztofowicz R 2004 lsquoBayesian processor of output A new technique for probabilisticweather forecastingrsquo Paper 42 17th Conference on Proba-bility and Statistics in the Atmospheric Sciences 84th AMSAnnual Meeting Seattle 11ndash15 January 2004 AmericanMeteorological Society Available at httpamsconfexcomams84Annual17PROBSTAabstracts69608htm

Legg T P and Mylne K R 2004 Early warnings of severe weather from ensemble forecast infor-mation Weather and Forecasting 19 891ndash906

Lindley D V and Novick M R 1981 The role of exchangeability in inference Ann Statistics 9 45ndash58Raftery A E Gneiting T

Balabdaoui F andPolakowski M

2005 Using Bayesian model averaging to calibrate forecast ensemblesMon Weather Rev 133 1155ndash1174

Roulston M S and Smith L A 2003 Combining dynamical and statistical ensembles Tellus 55A16ndash30

Sivillo J K Ahlquist J E andToth Z

1997 An ensemble forecasting primer Weather and Forecasting 12809ndash818

Walley P 1996 Inferences from multinomial data Learning about a bag ofmarbles J R Stat Soc 58B 3ndash57

Wang X and Bishop C H 2005 Improvement of ensemble reliability with a new dressing kernelQ J R Meteorol Soc 131 965ndash986

Weisstein E W 2002 CRC Concise encyclopedia of mathematics 2nd edition CRCPress