Estimation of predictive hydrologic uncertainty using the quantile regression and UNEEC methods and...
-
Upload
unesco-ihe -
Category
Documents
-
view
0 -
download
0
Transcript of Estimation of predictive hydrologic uncertainty using the quantile regression and UNEEC methods and...
Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015
www.hydrol-earth-syst-sci.net/19/3181/2015/
doi:10.5194/hess-19-3181-2015
© Author(s) 2015. CC Attribution 3.0 License.
Estimation of predictive hydrologic uncertainty using the quantile
regression and UNEEC methods and their comparison on
contrasting catchments
N. Dogulu1,a, P. López López1,2,b, D. P. Solomatine1,3, A. H. Weerts2,4, and D. L. Shrestha5
1UNESCO-IHE Institute for Water Education, Delft, the Netherlands2Deltares, Delft, the Netherlands3Delft University of Technology, Delft, the Netherlands4Hydrology and Quantitative Water Management Group, Department of Environmental Sciences, Wageningen University,
Wageningen, the Netherlands5CSIRO Land and Water, Highett, Victoria, Australiaacurrently at: Department of Civil Engineering, Middle East Technical University, Ankara, Turkeybnow at: Utrecht University (Utrecht) and Deltares (Delft), the Netherlands
Correspondence to: N. Dogulu ([email protected])
Received: 14 July 2014 – Published in Hydrol. Earth Syst. Sci. Discuss.: 10 September 2014
Revised: 23 May 2015 – Accepted: 19 June 2015 – Published: 23 July 2015
Abstract. In operational hydrology, estimation of the predic-
tive uncertainty of hydrological models used for flood mod-
elling is essential for risk-based decision making for flood
warning and emergency management. In the literature, there
exists a variety of methods analysing and predicting uncer-
tainty. However, studies devoted to comparing the perfor-
mance of the methods in predicting uncertainty are limited.
This paper focuses on the methods predicting model residual
uncertainty that differ in methodological complexity: quan-
tile regression (QR) and UNcertainty Estimation based on lo-
cal Errors and Clustering (UNEEC). The comparison of the
methods is aimed at investigating how well a simpler method
using fewer input data performs over a more complex method
with more predictors. We test these two methods on several
catchments from the UK that vary in hydrological charac-
teristics and the models used. Special attention is given to
the methods’ performance under different hydrological con-
ditions. Furthermore, normality of model residuals in data
clusters (identified by UNEEC) is analysed. It is found that
basin lag time and forecast lead time have a large impact on
the quantification of uncertainty and the presence of normal-
ity in model residuals’ distribution. In general, it can be said
that both methods give similar results. At the same time, it
is also shown that the UNEEC method provides better per-
formance than QR for small catchments with the changing
hydrological dynamics, i.e. rapid response catchments. It is
recommended that more case studies of catchments of dis-
tinct hydrologic behaviour, with diverse climatic conditions,
and having various hydrological features, be considered.
1 Introduction
The importance of accounting for uncertainty in hydrolog-
ical models used in flood early warning systems is widely
recognized (e.g. Krzysztofowicz, 2001; Pappenberger and
Beven, 2006). Such an uncertainty in the model prediction
stems mainly from four important sources: perceptual model
uncertainty, data uncertainty, parameter estimation uncer-
tainty, and model structural uncertainty (e.g. Solomatine and
Wagener, 2011). Analysis of predictive uncertainty (Todini,
2008) of hydrological models used for flood modelling en-
able hydrologists and managers to achieve better risk-based
decision making and thus has the potential to increase the re-
liability and credibility of flood warning. Therefore, the ne-
cessity of estimating the predictive uncertainty of rainfall–
runoff models is broadly acknowledged in operational hy-
drology, and the management of uncertainty in hydrologic
Published by Copernicus Publications on behalf of the European Geosciences Union.
3182 N. Dogulu et al.: Estimation of predictive hydrologic uncertainty
predictions has emerged as a major focus of interest in both
research and operational modelling (Wagener and Gupta,
2005; Liu and Gupta, 2007; Montanari, 2007; Todini, 2008).
In this respect comparing different methods, which are often
developed and tested in isolation, receives the attention of
researchers, e.g. as suggested within the HEPEX framework
(see van Andel et al., 2013).
While the discussions on the necessity of evaluating the
contributions of various sources of errors to the overall model
uncertainty have been going for a long time (see e.g. Gupta
et al., 2005; Brown and Heuvelink, 2005; Liu and Gupta,
2007), there have also been attempts to estimate the residual
uncertainty. By residual uncertainty, we understand the re-
maining model uncertainty assuming that other sources were
accounted for (for example by calibrating the parameters),
or not considered (all other sources like an inaccurate rating
curve, inputs, etc.) (Solomatine and Shrestha, 2009). We rec-
ognize that there are many sources of uncertainty leading to
uncertainty in the model output (their influence is typically
explored by running Monte Carlo experiments). However, in
this paper we consider the uncertainty of model outputs, as-
suming that parameters, inputs and the data used for model
calibration are known (so we do not consider their uncer-
tainty explicitly). Within this context, a (residual) model er-
ror is seen as a manifestation of the (residual) model uncer-
tainty.
In this context, two classes of uncertainty analysis meth-
ods can be considered. The first one relates to the Bayesian
framework with the meta-Gaussian transformation of data
as its important part; these methods are based on a rigor-
ous statistical framework. The following techniques and pa-
pers can be mentioned: the original Bayesian forecasting sys-
tem (BFS) and the Hydrological Uncertainty Processor as
its part (Krzysztofowicz, 1999; Krzysztofowicz and Kelly,
2000); its implementations and variations described in Mon-
tanari and Brath (2004), Reggiani and Weerts (2008), Reg-
giani et al. (2009), and Bogner and Pappenberger (2011); and
the Model Conditional Processor (Todini, 2008; Coccia and
Todini, 2011).
The other class of methods (of which two are dealt with
in this paper) includes more “straightforward” ones which
are directly oriented at predicting the properties (quantiles)
of the residual error distribution by linear or non-linear re-
gression (machine learning) techniques: quantile regression
(QR) (Koenker and Basset, 1978) with its applications in hy-
drology reported by Solomatine and Shrestha (2009), Weerts
et al. (2011), and López López et al. (2013); UNcertainty
Estimation based on local Errors and Clustering (UNEEC)
that uses machine learning techniques (Shrestha and Soloma-
tine, 2006; Solomatine and Shrestha, 2009); and the dynamic
uncertainty model by regression on absolute error (DUMB-
RAE) (Pianosi and Raso, 2012). In this paper we consider
two methods from this class that differ in their methodolog-
ical complexity: quantile regression (QR) and UNcertainty
Estimation based on local Errors and Clustering (UNEEC).
Quantile regression (Koenker and Basset, 1978; Koenker
and Hallock, 2001; Koenker, 2005) is basically a set of lin-
ear regression models (typically, two) where predictands (re-
sponse variables) are the selected quantiles of the conditional
distribution of some variables (discharge or water level in
the present research study), and predictors are lagged values
of the same variable. This methodology allows for examina-
tion of the entire distribution of the variable of interest rather
than a single measure of the central tendency of its distribu-
tion (Koenker, 2005). QR models have been used in a broad
range of applications: economics and financial market anal-
ysis (Kudryavtsev, 2009; Taylor, 2007), agriculture (Barnwal
and Kotani, 2013), meteorology (Bremnes, 2004; Friederichs
and Hense, 2007; Cannon, 2011), wind forecasting (Nielsen
et al., 2006; Møller et al., 2008), the prediction of ozone con-
centrations (Baur et al., 2004; Munir et al., 2012), etc. In hy-
drological modelling the QR method has been applied as an
uncertainty post-processing technique in previous research
studies with different configurations.
The configurations of QR differ mainly in two aspects:
treatment of the quantile crossing problem (a problem when
quantiles of the lower order appear to be larger than those
of the higher order) and the quantiles derivation in normal
space using normal quantile transformation (NQT). Solo-
matine and Shrestha (2009) make use of the classical QR
approach, without considering quantile crossing and NQT.
Weerts et al. (2011), Verkade and Werner (2011), and Roscoe
et al. (2012) apply QR to various deterministic hydrologic
forecasts. The QR configuration investigated in these stud-
ies uses the water level or discharge forecasts as predictors
to estimate the distribution quantiles of the model error. It
includes a transformation into normal space using the NQT
and the quantile crossing problem is addressed by imposing
a fixed distribution of the predictand in the crossing domain.
Singh et al. (2013) make use of a similar configuration dif-
ferentiating two cases based on the similarities in informa-
tion content between calibration and validation data periods.
Coccia and Todini (2011) observe that QR’s usefulness and
performance depend on the assumed patterns in quantiles; for
example, lack of linear variation of the error variance with
the magnitude of the forecasts hinders reasonable estimation
of the quantiles, especially for high flows/water levels. López
López et al. (2014) apply QR to predict the quantiles of the
environmental variables itself (water level) rather than the
quantiles of the model error, and the four different configura-
tions of QR are compared and extensively verified. It should
be noted that, in this study, by design, the only predictor
in QR is the deterministic model output for discharge/water
level, and the quantiles of observed discharge/water level are
estimated through linear regression.
The UNEEC method was introduced 10 years ago
(Shrestha and Solomatine, 2006; Shrestha et al., 2006). The
method builds a non-linear regression model (machine learn-
ing, e.g. an artificial neural network) to estimate the quantiles
of the error distribution, and it assumes that residual uncer-
Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015 www.hydrol-earth-syst-sci.net/19/3181/2015/
N. Dogulu et al.: Estimation of predictive hydrologic uncertainty 3183
tainty depends on the modelled system state characteristics
so that any variable can be used as a predictor. A notable
characteristic of UNEEC is the special attention to achiev-
ing accuracy by local modelling of errors (by clustering and
treating clusters separately), so that particularities of differ-
ent hydrometeorological conditions, i.e. heterogeneities in-
herent to rainfall–runoff process, are represented through dif-
ferent error pdfs. Shrestha and Solomatine (2006) tested the
UNEEC method on the Sieve catchment in Italy based on the
estimates of lower and upper prediction limits correspond-
ing to the 90 % confidence level (CL). The method was also
applied to a different catchment (Brue, in the UK; HBV
model) and it was compared to the Bayesian meta-Gaussian
approach (Montanari and Brath, 2004), as well as the version
of Monte Carlo technique GLUE (Beven and Binley, 1992).
It was reported that the uncertainty estimates obtained by
UNEEC were in fact more acceptable and interpretable than
those obtained by the other methods. UNEEC was further ex-
tended to estimate several quantiles (thus approximating the
full pdf of the error distribution) and applied to the Bagmati
catchment in Nepal (Solomatine and Shrestha, 2009), and it
was compared to several other methods including QR. It was
found that the UNEEC method generated consistent and in-
terpretable results which are more accurate and reliable than
QR. In the further study (Pianosi et al., 2010) UNEEC was
extended so as to include parametric uncertainty (UNEEC-
P); however, local features of uncertainty were not consid-
ered. Nasseri et al. (2013) compared UNEEC with meth-
ods which are mainly based on the fuzzy extension princi-
ple: IMFEP (incremental modified fuzzy extension principle)
and MFEP (modified fuzzy extension principle). It has been
shown that the methods provided similar performance on the
two monthly water balance models for the two basins in Iran
and France.
Solomatine and Shrestha (2009) presented their initial ex-
periments to compare QR and UNEEC on one case study,
and Weerts et al. (2011) discussed the experience with QR
on another one. In this paper we go further and test the newer
variants of these methods on several contrasting catchments
that cover a wide range of climatic conditions and hydrologi-
cal characteristics. The motivation here is to identify possible
advantages and disadvantages of using the QR and UNEEC
methods based on their comparative performance, especially
during flooding conditions (i.e. for the data cluster associ-
ated with high flow/water level conditions). The knowledge
gaps regarding the use of the methods with different param-
eterizations are addressed. For example, we now incorpo-
rate into UNEEC the autoregressive component by consid-
ering past error values (in addition to discharge and effective
rainfall) in one case study, and model outputs for the state
variables soil moisture deficit (SMD) and groundwater level
(GW) are used as predictors (in addition to water level) in
another case study. In the QR version implemented, the lin-
ear regression model was established to predict the quantiles
of observed water levels conditioned on simulated/forecasted
water levels. Furthermore, we present results of statistical
analysis of error time series to better understand (hydrolog-
ical) models’ quality in relation to its effect on uncertainty
analysis results, and to discuss the assumption of normality
in the model residuals, particularly in view of the cluster-
ing approach employed within the framework of the UNEEC
method. We apply methods to estimate predictive uncertainty
in the Brue catchment (southwestern UK) and the Upper Sev-
ern catchments – Yeaton, Llanyblodwel, and Llanerfyl (Mid-
lands, UK).
It should be noticed that by design UNEEC uses a richer
set of predictors than QR and a more sophisticated non-linear
regression model, so the comparison between simple and
complex models may seem unfair. However, more predictors
may not bring more information needed for accurate predic-
tion. Only experiments can allow for stating that for each par-
ticular case. Our experience with the data-driven models (and
both QR and UNEEC are such) showed that adding more
predictors does not necessarily mean higher accuracy on un-
seen data. Parsimony (Box et al., 2008) often leads to better
generalization. In this study we compare the two uncertainty
prediction methods, with the aim of investigating whether a
simpler method using fewer input data may possibly perform
better than the more complex method with more predictors.
Overall, selection of the most appropriate uncertainty pro-
cessor for a specific catchment is a matter of compromise
between its complexity and accuracy in consideration of the
data availability and also the characteristics of the catchment,
and we believe the findings of such a comparative analysis
could be useful for the operational hydrology community.
The remainder of the paper is structured as follows. The
next section describes the residual uncertainty analysis meth-
ods (QR and UNEEC) and the validation measures used. Sec-
tion 3 describes the studied catchments and the experimental
set-up. The results for error and uncertainty analyses are pre-
sented and discussed in Sect. 4. In Sect. 5 the main conclu-
sions from the study and recommendations for future work
are presented.
2 Methodology
2.1 Uncertainty analysis methods
2.1.1 Definitions
As in Solomatine and Shrestha (2009) and Weerts et
al. (2011), we consider a deterministic (hydrological) model
M of a catchment predicting a system output variable y given
the input data vector x (x ∈X) and the vector of model pa-
rameters θ . There are various sources of error associated with
the model output (e.g. discharge), so the system response (i.e.
actual discharge) can be expressed as
yt+LT = y+ e =M(x,θ)+ e, (1)
www.hydrol-earth-syst-sci.net/19/3181/2015/ Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015
3184 N. Dogulu et al.: Estimation of predictive hydrologic uncertainty
where e is the total residual error (in the remainder of the
text, the terms “model error” and “model residual” are used
interchangeably to refer to e); t is the (discrete) time. The
model M can be used in two modes depending on the re-
lation between the lead time (LT: the duration between the
time of forecast and the time for which the forecast is made)
of interest and the model time step (1t):{simulation mode, LT= 1 ·1t
forecasting mode, LT> 1 ·1t.
}(2)
Given the model structureM , and the parameter set θ , the un-
certainty analysis methods used in this study, namely QR and
UNEEC, estimate the residual uncertainty of a calibrated hy-
drological model whose parameters and inputs are assumed
to be known exactly. In this set-up the different sources of
uncertainty are not distinguished explicitly. In both methods,
the uncertainty model U predicts the quantile value qτ and
is calibrated for different quantiles (τ ), and for various lead
times (LT) separately:
qτt+LT = U(I,λ), (3)
where I is the input data matrix, and λ is the vector of model
parameters. In a simplest case when the number of quantiles
is two, they form the CL (e.g. 90 %) and the corresponding
confidence interval, CI. The quantiles computed in this study
are τ = 0.05, 0.25, 0.75, and 0.95, allowing for formation of
the 50 and 90 % confidence intervals.
2.1.2 Quantile regression
As mentioned, several QR configurations have been previ-
ously investigated for estimating the residual uncertainty. In
López López et al. (2014) (in open access), the four alterna-
tive configurations of QR for several catchments at the Upper
Severn River have been compared and verified. The compar-
ative analysis included different experiments on the deriva-
tion of regression quantiles in original and in normal space
using NQT, a piecewise linear configuration considering in-
dependent predictand domains and avoiding the quantile
crossing problem with a relatively recent technique (Bondell
et al., 2010). The intercomparison showed that the reliability
and sharpness vary across configurations, but in none of the
configurations do these two forecast quality aspects improve
simultaneously. Further analysis reveals that skills in terms
of the various verification metrics (i.e. Brier skill score, BSS;
mean continuous ranked probability skill core, CRPSS; and
the relative operating characteristic score, ROCS) are very
similar across the four configurations. Therefore, noting also
the main idea behind the current study (which is to investi-
gate how well a simpler method using fewer input data per-
forms over a more complex method with more predictors),
the simplest QR configuration (termed there the “QR1: non-
crossing Quantile Regresssion”) was applied in this study.
QR1 estimates the quantiles of the distribution of water level
Figure 1. Quantile regression example scheme considering differ-
ent quantiles.
or discharge in the original domain, without any initial trans-
formation, and avoids the quantile crossing problem. A brief
description of the QR configuration used in the present work
is given below (for details, the reader is referred to López
López et al., 2014).
For every quantile τ , we assume a linear relationship be-
tween the forecasted (or predicted) value, s, and the real ob-
served value, s,
s = aτ s+ bτ , (4)
where aτ and bτ are the parameters of linear regression. By
minimizing the sum of residuals, one can find the parameters
aτ and bτ :
min
J∑j=1
ρτ (sj − (aτ sj + bτ )), (5)
where sj and sj are the j th paired samples from a total of
J samples and ρτ is the quantile regression function for the
quantile τ :
ρτ (εj )=
{(τ − 1) · εj ,εj ≤ 0,
τ · εj , εj ≥ 0
}. (6)
Equation (6) is applied for the error (εj ), which is defined as
the difference between the observation (sj ) and the linear QR
estimate (aτ sj + bτ ) for the selected quantile τ .
Figure 1 illustrates the estimation of a selection of quan-
tiles, including the 0.95, 0.75, 0.25 and 0.05 quantiles. To
obtain the QR function for a specific quantile, e.g. τ = 0.05,
Eqs. (5) and (6) are applied as follows:
ρ0.05(εj )=
{−0.95 · εj ,εj ≤ 0
0.05 · εj ,εj ≥ 0
}. (7)
In the case of an ideal model, the 95 % of observed–
forecasted pairs would be located above the τ = 0.05 quan-
tile linear regression line, and 5 % would remain below it.
Considering the two observed–forecasted pairs of the total of
Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015 www.hydrol-earth-syst-sci.net/19/3181/2015/
N. Dogulu et al.: Estimation of predictive hydrologic uncertainty 3185
J samples, j = 1 and j = 2, their corresponding errors, ε1
and ε2, are
ε1 = s1− (a0.05s1+ b0.05) < 0;
ε2 = s2− (a0.05s2+ b0.05) > 0. (8)
Introducing both values in Eq. (5), QR allows for solving the
minimization problem calculating the regression parameters
a0.05 and b0.05 for this particular quantile τ = 0.05:
min(−0.95 · ε1+ 0.05 · ε2+ . . .+ ρ0.05(εJ )). (9)
The procedure explained here can be applied for any quan-
tile, τ .
2.1.3 UNEEC
In UNEEC, a machine learning model, e.g. an artificial neu-
ral network, instance-based learning (e.g. k-nearest neigh-
bours) or a M5 model tree, is built to predict uncertainty
associated with the model outputs corresponding to the fu-
ture inputs to a (hydrological) model. The steps involved in
UNEEC are summarized below.
– Identify the set of predictor variables (the lagged rainfall
data, soil moisture, flow, etc.) that describe the flow pro-
cess based on their effect on the model error. These pre-
dictors can be selected using average mutual informa-
tion (AMI) and correlation analysis. Using AMI brings
the advantage of detection of non-linear relationships
(Battiti, 1994).
– Identify the fuzzy clusters in the data set in the space
of predictor variables (using e.g. the fuzzy c-means
method) (Fig. 2). The optimal number of clusters can
be determined using the methods described, e.g. in Xie
and Benie (1991), Halkidi et al. (2001), and Nasseri and
Zahraie (2011).
– For each cluster c, calculate the quantiles, qτc , of the
empirical distribution of the model error, taking into ac-
count however the membership degree of each data vec-
tor to a considered cluster.
– For each data vector, calculate the “global” estimate of
the quantile qτ using the quantiles calculated for each
cluster qτc . This is done by weighting them by the corre-
sponding degree of membership of the given data vector
to this cluster. Calculated qτ values for each quantile τ
are used as outputs for the uncertainty model U .
– Train a machine learning model (U ) (e.g. ANN or M5
model tree) using the set of predictors as inputs, and the
data prepared at the previous step as the output. U will
be able to predict the quantile value qτ for the new input
vectors.
Figure 2. An example of fuzzy clustering of input data (the pre-
dictors are past rainfall at lag t − 2 and past flow at lag t − 1) dur-
ing training of the uncertainty model, U (adapted from Solomatine,
2013).
Various machine learning models can be employed; in this
study the M5 model tree (Quinlan, 1992) has been used for
all case studies. A model tree is a tree-like modular model
which is in fact equivalent to a piecewise linear function.
At non-terminal nodes there are rules that progressively split
data into subsets, and finally the linear regression equations
in the leaves of the tree built on the data subset that reached
this particular leaf. The main reasons for using this tech-
nique are its accuracy, transparency (analytical expressions
for models are obtained explicitly) and speed in training.
Model trees have shown high accuracy in our previous stud-
ies (e.g. Solomatine and Dulal, 2003).
2.2 Validation methods
In this study we use several statistical measures of uncer-
tainty to evaluate and to some extent compare performances
of QR and UNEEC. These are, namely, prediction interval
coverage probability (PICP; Shrestha and Solomatine, 2006),
mean prediction interval (MPI; Shrestha and Solomatine,
2006), and average relative interval length (ARIL; Jin et al.,
2010). PICP has also been used by other authors (e.g. Laio
and Tamea, 2007) as an important performance measure to
estimate the accuracy of probabilistic forecasts.
PICP should be seen as the most important measure since
it shows how many observations fall into the estimated inter-
val. PICP is the probability that the observed values (yt ) lie
within the estimated prediction limits (PL) computed for a
significance level of 1−α (e.g. 90 %):
PICP=1
n
∑n
t=1C, where C ={
1, PLlowert ≤ yt ≤ PL
uppert
0, otherwise
}. (10)
Ideally, the PICP value should be equal to or close to the
specified CL.
MPI computes the average width of the uncertainty band
(or prediction interval), i.e. the distance between the up-
www.hydrol-earth-syst-sci.net/19/3181/2015/ Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015
3186 N. Dogulu et al.: Estimation of predictive hydrologic uncertainty
per and lower prediction limits (PLuppert and PLlower
t , respec-
tively):
MPI=1
n
∑n
t=1(PL
uppert −PLlower
t ). (11)
MPI= 0 means there is no uncertainty at all. MPI is a rather
simple indicator giving an idea about the distribution sharp-
ness.
ARIL is similar to MPI and considers the average width of
uncertainty bounds in relation to the observed value:
ARIL=1
n
∑n
t=1
(PLuppert −PLlower
t )
yt. (12)
Having the observed value in the denominator accounts for
the fact that uncertainty (and MPI) is usually higher for
higher values of flow and thus has a “normalization” effect. A
problem with ARIL is that if the flow is 0 or close to 0, ARIL
will be infinity or very high. This problem could be helped
by removing all observations above a certain threshold from
the calculations (a suggestion of one of the reviewers of this
paper); we leave this idea for further testing in the future re-
search.
A possibility to combine PICP and ARIL is to use the NUE
indicator proposed by Nasseri and Zahraie (2011):
NUE=PICP
w×ARIL. (13)
Nasseri and Zahraie (2013) recommend that methods with
the higher NUE should be preferred over those with the lower
NUE; however, we do not think this is a universally applica-
ble recommendation: if for two methods PICP is equal and
close to the confidence interval (90 %) and ARIL for one
method is higher (which is not good), then the NUE for this
method will actually be lower.
There is no single objective measure of the quality of an
uncertainty prediction method (since the “actual” uncertainty
of the model (error pdf) at each time step is not known). The
closer PICP is to the CL, the higher the trust in a partic-
ular uncertainty prediction method should be. In principle,
a reliable method should lead to reasonably low values of
MPI (and ARIL). However, a wide MPI does not mean that
a method estimating the prediction interval is inaccurate – it
could simply mean that the main model is not very accurate,
and the high MPI shows that.
PICP indeed evaluates whether the expected percentage of
observations falls into the predicted interval, and should be
seen as an important average indicator of the predictor’s per-
formance. However, in the case of high noise in the model
error (aleatoric uncertainty), the fact that PICP is far from
90 % could mean simply that none of the data-driven predic-
tive models can capture the input–output dependencies and
predict quantiles accurately. For comparative studies, how-
ever, PICP can very well be used: the method with PICP
closest to 90 % should be seen as the best (with some toler-
ance). Additional analysis may be carried out to see whether
the methods developed for the assessment of the probabilistic
forecast quality can be used (Laio and Tamea, 2007) (it is not
exactly the same as the residual uncertainty analysed here,
but the mathematical apparatus seems to be transferrable). In
this paper, however, we have not considered these, so they
can be recommended for exploration and testing in the future
studies.
It is also worth mentioning that all considered measures
are averages and so should be used together with the uncer-
tainty bound plots of which visual analysis reveals more in-
formation on the capacity of different uncertainty prediction
methods during particular periods.
3 Application
3.1 Case studies
3.1.1 Brue catchment
Located in the southwest of England, the Brue River catch-
ment has a history of severe flooding. Draining an area of
135 km2 to its river gauging station at Lovington (Fig. 3a),
the catchment is predominantly rural and of modest relief
and gives rise to a responsive flow regime due to its soil
properties. The major land use is pasture on clay soil. The
mean annual rainfall in the catchment is 867 mm and mean
river flow is 1.92 m3 s−1 (basin average, 1961–1990) (Ta-
ble 1). This catchment has been extensively used for research
on weather radar, quantitative precipitation forecasting and
rainfall–runoff modelling, as it has been facilitated with a
dense rain-gauge network (see e.g. Moore et al., 2000; Bell
and Moore, 2000).
The flow in the Brue River was simulated by the HBV-96
model (Lindström et al., 1997), which is an updated version
of the HBV rainfall–runoff model (Bergström, 1976). This
lumped conceptual hydrological model consists of subrou-
tines for snow accumulation and melt (excluded for Brue),
the soil moisture accounting procedure, routines for runoff
generation, and a simple routing procedure (Fig. 3b). The in-
put data used are hourly observations of precipitation (basin
average), air temperature, and potential evapotranspiration
(estimated by the modified Penmann method) computed
from the 15 min data. The model time step is 1 h (1t = 1 h).
The model is calibrated automatically using the adaptive
cluster covering algorithm (ACCO) (Solomatine et al., 1999).
The data sets used for calibrating and validating the HBV-
96 model are based on Shrestha and Solomatine (2008). It
should be mentioned that the discharge data on calibration
have many peaks which are higher in magnitude compared
to those in the validation data.
The uncertainty analyses conducted for the Brue catch-
ment are based on one-step-ahead flow estimates, i.e.
LT= 1 h (simulation mode). Effective rainfall (rainfall minus
Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015 www.hydrol-earth-syst-sci.net/19/3181/2015/
N. Dogulu et al.: Estimation of predictive hydrologic uncertainty 3187
Table 1. Summary of the main basin characteristics for the catchments selected.
Catchment name Drainage area Elevation Mean flow Mean annual rainfall Highest river level recorded Basin lag time
(km2) (m) (m−3 s−1) (mm) (m) (h)
Brue 135 ≈ 20 1.92∗ 867∗ 4.45∗∗∗ 8–9
Yeaton 180.8 61.18 1.60∗∗∗∗ 767∗∗∗∗ 1.13∗∗∗ 15–20
Llanyblodwel 229 77.28 6.58∗∗∗∗ 1267∗∗∗∗ 2.68∗∗∗ 7–10
Llanerfyl ≈ 100 151 > 10∗∗ > 1300∗∗ 3.59∗∗∗ 3–5
∗ Basin average for the period 1961–1990.∗∗ Rough estimates based on the data available for 2006–2013.∗∗∗ http://apps.environment-agency.gov.uk/river-and-sea-levels/∗∗∗∗ Computed for the periods 1963–2005 and 1973–2005 for Yeaton and Llanyblodwel, respectively and taken from the UK Hydrometric Register (Marsh and Hannaford,
2008).
Figure 3. (a) The Brue catchment showing a dense rain-gauge network and its river-gauging station, Lovington, where the discharge is
measured, and (b) schematic representation of the HBV-96 model (Lindström et al., 1997) with the routine for snow (upper), soil (middle),
and response (bottom) (Shrestha and Solomatine, 2008).
evapotranspiration) values were used instead of using rainfall
data directly.
3.1.2 Upper Severn catchments: Yeaton, Llanyblodwel,
and Llanerfyl
Flowing from the Cambrian Mountains (610 m) in Wales, the
River Severn is the longest river in Britain (about 354 km).
It forms the border between England and Wales and flows
into the Bristol Channel. The river drains an area of approx-
imately 10 500 km2 above the monitoring station at Upton
on Severn. Mean annual precipitation ranges from approxi-
mately 2500 mm in the west to less than 700 mm in the south
(EA, 2009). The Upper Severn includes rock formations clas-
sified as non-aquifers as well as loamy soils characterized
by their high water retention capacity (for a more detailed
description of the Upper Severn, see Hill and Neal, 1997).
Flooding is a major problem at the downstream due to ex-
cessive rainfall at the upstream (the Welsh hills), early 2014
floods being the most recent significant floods that occurred.
In this work, the three sub-catchments of the Upper Sev-
ern River are analysed: Yeaton, Llanyblodwel, and Llanerfyl
(Fig. 4). The area, elevation, mean flow, mean annual rainfall
and basin lag time (time of concentration) information of the
catchments are presented in Table 1. The Yeaton catchment
is located at a lower elevation and over a flat area compared
to Llanerfyl and Llanyblodwel. This catchment also has the
longest basin lag time. The smallest catchment in terms of
drainage area is Llanerfyl, which also has the shortest basin
lag time (approx. 3–5 h) leading to flash floods, so that the
predictive uncertainty information on flood forecast for this
catchment has especially high importance.
In the Midlands Flood Forecasting System (MFSS; a
Delft-FEWS forecast production system as described in
Werner et al., 2013), the Upper Severn catchment is repre-
sented by a combination of numerical models for rainfall–
runoff modelling (MCRM; Bailey and Dobson, 1981), hy-
drological routing (DODO; Wallingford, 1994), hydrody-
namic routing (ISIS; Wallingford, 1997), and error correction
(ARMA). The input data used within MFSS include (a) real-
time spatial data (observed water level and rain-gauge data as
well as air temperature and catchment average rainfall); (b)
radar actuals; (c) radar forecasts; and (d) numerical weather
prediction data (all provided by the UK Meteorological Of-
www.hydrol-earth-syst-sci.net/19/3181/2015/ Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015
3188 N. Dogulu et al.: Estimation of predictive hydrologic uncertainty
Figure 4. The Upper Severn catchments: Yeaton, Llanyblodwel and
Llanerfyl.
fice). The data available were split into two parts for calibra-
tion (7 March 2007, 08:00 UTC–7 March 2010, 08:00 UTC)
and validation (7 March 2010, 20:00 UTC–7 March 2013,
08:00 UTC), preserving similar statistical properties in both
data sets.
The forecasting system issues two forecasts per day (08:00
and 20:00 UTC) with a time horizon of 2 days. First, the es-
timates of internal states are obtained by running the models
(which are forced with observed precipitation, evapotranspi-
ration and temperature) in historical mode over the previous
period. The state variables for the (hydrological) model are
soil moisture deficit (SMD, the amount of water required to
bring the current soil moisture content to field capacity in the
root zone), groundwater level (GW), snow water equivalent
(SWE), and snow density (SD). Using a stand-alone version
of MFSS, the system (forced by the forecasted precipitation)
is then run forward with a time step of 1 h.
It is important to note that this case study, unlike the Brue
catchment, includes errors in the meteorological forecast and
the back transformation of discharge to water level – via a
rating curve – in a lumped manner. Therefore, the effects
of rating curve uncertainty (Di Baldassarre and Montanari,
2009; Sikorska et al., 2013; Coxon et al., 2014; Mukolwe
et al., 2014) and precipitation forecast uncertainty (Kobold
and Sušelj, 2005; Shrestha et al., 2013) are accommodated
as well.
The uncertainty analysis is aimed at estimating predictive
uncertainty for the forecast time series (1t = 12 h) corre-
sponding to the lead time of interest. In this study, we con-
sider the lead times LT= 1, 3, 6, 12, and 24 h only.
3.2 Experimental set-up
In all case studies the QR uncertainty prediction method em-
ploys a linear regression model. While in the Brue catchment
the linear regression model estimates the quantile τ of ob-
served discharge conditioned on simulated discharge, in the
Upper Severn catchments the linear regression model esti-
mates the quantile τ of the observed water level conditioned
on the forecasted water level. In UNEEC the M5 model tree
is employed as the prediction model. Selection of the best set
of the input variables for UNEEC is based on AMI and corre-
lation analysis, and the number of clusters is identified by the
model-based optimization. UNEEC is configured differently
for each case, as described below.
3.2.1 Brue catchment
Shrestha and Solomatine (2008) tested the UNEEC method
on the Brue catchment to assess residual uncertainty of the
one-step-ahead flow estimates. The predictors of model er-
ror identified using AMI and correlation analysis were only
lagged discharge (Qt−1,Qt−2,Qt−3) and effective rainfall
(REt−8, REt−9, REt−10) values. In this study, however, we
try a different set of predictors. In addition to the mentioned
variables, we also consider the two most recent past error
values (et−1, et−2), allowing thus for incorporation of the au-
toregressive features (for this case study it paid off – MPI
values decrease (< 5 %) during both training and test peri-
ods). As in the previous study, the number of clusters used
was five.
3.2.2 Upper Severn catchments: Yeaton, Llanyblodwel,
and Llanerfyl
In the Upper Severn case studies, a variety of predictors are
considered for the model, e.g. observed and modelled wa-
ter level, forecasted precipitation, and state variables (GW,
SMD, SWE, SD). Although the benefits of using the soil
moisture (observed or modelled) and groundwater level in-
formation for modelling rainfall–runoff processes and pre-
dicting runoff are well known in the literature (Aubert et
al., 2003; Lee and Seo, 2011; Tayfur et al., 2014), we can-
not cite any studies exploring the possible advantages of us-
ing such information for improving predictive capabilities of
uncertainty analysis methods. Therefore, the dependence of
model residuals on variables expressing the internal state of
the catchments is also analysed.
Among the state variables, the most significant correlation
with the model error was shown by GW and SMD. While
GW was found to be positively correlated with model resid-
uals (i.e. as GW increases, error increases too), SMD and
model error had a negative correlation. The positive correla-
tion between GW and model residuals can be explained by
the fact that high groundwater levels are associated with ex-
cessive precipitation during which model errors are higher
Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015 www.hydrol-earth-syst-sci.net/19/3181/2015/
N. Dogulu et al.: Estimation of predictive hydrologic uncertainty 3189
in magnitude. High soil moisture deficit, on the other hand,
indicates that there has been no excessive precipitation and
that the soil is not filled up with infiltrated water. High evap-
oration rates (causing soil to dry up) can also result in high
soil moisture deficit. It should be noted that the latter is less
likely to be valid for the Upper Severn catchments consider-
ing the prevailing climate in the region. Accordingly, lower
soil moisture deficit is linked with excessive precipitation
events such that soil moisture deficit is negatively correlated
with the model error.
Eventually, on the basis of studying the correlations and
AMI between various candidate predictors and the output,
and using expert judgement, the following variables have
been chosen to serve as candidate predictors:
– the most recent precipitation (Pt−1),
– the observed water level (Hobs,t−1),
– error (et−1),
– state variables GW and SMD.
It should be noted that subscript t−1 denotes the 12 h de-
lay since the data sets analysed have a time step of 12 h (see
Sect. 3.1.2).
In an attempt at removing least influential inputs, the set of
variables above was then subjected to the model-based opti-
mization: the degree of influence of various inputs has been
explored by running the UNEEC predictor for different sets
of inputs and comparing the resulting PICP and MPI. It was
found that there were only negligible changes (and mostly no
change) when Pt−1 and et−1 were included or not. Based on
this analysis, these two variables have been excluded from
the further experiments, and only the variables GW, SMD,
and Hobs,t−1 have been used as predictors. Inclusion of GW
was important since this variable provides more explainable
results in terms of PICP and MPI. It should be noted that us-
ing GW and SMD can be considered as a proxy for using the
rainfall information.
Fuzzy clustering in UNEEC is carried out by the fuzzy
c-means method and employs six clusters with the fuzzy ex-
ponential coefficient set to 2. The number of clusters was
chosen based on computation of the Partition Index (SC), the
Separation Index (S) and the Xie and Beni Index (XB) (Ben-
said et al., 1996; Xie and Beni, 1991). (It should be men-
tioned that the sensitivity of PICP and MPI to different num-
bers of clusters supports the choice of six clusters.)
Within the variables considered in clustering, GW is the
most influential one. Fig. 5 shows fuzzy clustering of GW,
SMD, and Hobs,t−1 data for the Llanyblodwel catchment
(lead time 6 h). This figure also contains the plot of model
residuals against GW, where one can observe heteroscedas-
ticity of model residuals with respect to GW. As can be easily
seen, while Cluster 2 is associated with very high groundwa-
ter levels, Cluster 4 is associated with the low groundwater
level conditions, which might occur due to the low water lev-
els in the river and/or high soil moisture deficit.
It must be noted that in this study the hydrological model
output is not included as yet another input to UNEEC (along
with the observed discharge/water level) in all case studies.
However, it may be worth exploring this idea in the further
studies.
4 Results and discussion
This part focuses on statistical error analysis (Sect. 4.1) and
comparison of uncertainty analysis results (Sect. 4.2).
4.1 Statistical error analysis
Understanding the quality of hydrological model quality (e.g.
water level forecasts) is important in order to discuss uncer-
tainty analysis results provided by any method. For this pur-
pose, we analyse the error time series statistically. We also
check the homoscedasticity (the assumption which simpli-
fies the mathematical and computational treatment of ran-
dom variables) of the model residuals. Furthermore, we in-
vestigate the normality of model residuals through probabil-
ity plots of the normal distribution and the t location-scale
distribution, whose pdf is given by Eqs. (14) and (15), re-
spectively:
f (x)= (1/σ ·√
2π ) · e−(x−µ)2/
2σ 2 (14)
f (x)=0(ν+ 1
/2)
σ ·√
2π ·0 ·(ν+ 1
/2) ·[
ν+ (x−µ /σ )2
ν
]−(ν+1/2 )
, (15)
where µ is the location parameter (mean), σ is the scale pa-
rameter (standard deviation), ν is the shape parameter (i.e.
the number of degrees of freedom), and 0 is the gamma func-
tion. The t location-scale distribution is similar to the normal
distribution but has heavier tails, making it more prone to
outliers. Within this study outliers refer to very high model
residuals occurring during extreme precipitation and flow
events. In the case of normality of data, its analysis becomes
much simpler; however, often this is not the case.
Residual uncertainty varies in time and with the changing
hydrometeorological situation, so in this paper we investi-
gate the residual distribution for different hydrometeorologi-
cal conditions represented by clusters found within the UN-
EEC method (on the training data set).
4.1.1 Brue catchment
The observed discharge plotted against simulated discharge
during calibration and validation periods can be seen in
Fig. 6a and c, respectively. During calibration, although the
www.hydrol-earth-syst-sci.net/19/3181/2015/ Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015
3190 N. Dogulu et al.: Estimation of predictive hydrologic uncertainty
Figure 5. Fuzzy clustering of GW (left, top) and its relation to the model residuals (right), SMD (left, middle) and Hobs, t−1 (left, bottom)
for the calibration period (7 March 2007, 08:00 UTC–7 March 2010, 08:00 UTC) – Llanyblodwel, lead time 6 h.
Figure 6. Observed discharge, simulated discharge and model residuals during calibration and validation (Brue catchment).
model residuals are lower at flows higher than 35 m3 s−1
compared to at flows less than 35 m3 s−1 (in Fig. 6a), it can
be seen from Fig. 6c that the HBV-96 model is less accurate
in simulating high flows compared to low flows. It is also
noteworthy to mention that in calibration (Fig. 6a) there is
higher dispersion around the diagonal line than in validation
(Fig. 6c).
Figure 6b and 6d shows how model residuals change with
increasing discharge values during calibration and validation
periods, respectively. Clearly, model residuals of the Brue
catchment are heteroscedastic; that is to say, the variance of
model residuals varies with the effect being modelled, i.e.
observed discharge.
Figure 7 presents probability plots for model residuals dur-
ing training. The top left plot compares the two selected
distributions (normal distribution and t location-scale dis-
tribution). The estimated parameters for the best fit to data
are µ= 0.0363 m3 s−1 and σ = 0.7619 m3 s−1 for normal
distribution – same with the empirical parameters. On the
other hand, the best fit parameters for t location-scale distri-
bution are different: µ= 0.0607 m3 s−1, σ = 0.2351 m3 s−1
and ν = 1.5833. From this figure, one can conclude that the
model residuals’ distribution is far from being close to nor-
mal even though the parameters of the fitted normal distribu-
tion are the same as those obtained from the empirical dis-
tribution. It is obvious that t location-scale distribution pro-
vides a better fit as it is able to enclose the data at the tails
much better compared to a fitted normal distribution. Yet, the
outliers are still not represented fully.
Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015 www.hydrol-earth-syst-sci.net/19/3181/2015/
N. Dogulu et al.: Estimation of predictive hydrologic uncertainty 3191
Figure 7. Probability plots for model residuals (during training) for the Brue catchment: comparison of the two fitted distributions: normal
vs. t location-scale distribution (top left), and the clusters.
Normality of the model residuals’ distribution is further
investigated for different hydrometeorological conditions as
identified by clustering in the space of the predictor variables.
Analysis of the probability plot for each cluster formed in-
dicates that there is no significant departure from normality
(with regard to the fitted normal distribution), unlike in the
overall model residuals. The most striking result among all
clusters is achieved in the one representing very high flow
and high rainfall (Cluster 4, 0.95 % of total data) (Fig. 7, bot-
tom middle). The distribution of all the other clusters (Clus-
ters 1, 2, 3, and 5) was found to be more or less equally close
to normal. When visually compared, these distributions were
only slightly less close to normal with respect to Cluster 4.
4.1.2 Upper Severn catchments: Yeaton, Llanyblodwel,
and Llanerfyl
The quality of (water level) forecasts is assessed based on
standard deviation of model error. The results are compara-
tively presented for different lead times in Fig. 8. It can be
clearly seen that during both calibration and validation as
lead time increases, the standard deviation of error increases
as well. Also, it should be noticed that there is a direct in-
creasing effect of shorter basin lag time on standard devia-
tion. For example, the catchment with the shortest basin lag
time, that is Llanerfyl, always has a larger standard devia-
tion for all lead times. On the contrary, the smallest standard
deviation always occurs in the catchment with the longest
basin lag time, which is Yeaton. This is mainly due to the
fact that the basin lag time represents the memory of a catch-
ment. Hence, the flood forecasting capability of a hydrolog-
ical model is affected negatively when the basin lag time is
short.
The observed water levels are plotted against forecasted
water levels in the Llanyblodwel catchment during calibra-
tion and validation for lead time 6 h in Fig. 9a and c, respec-
tively. Figure 9b and d shows model error plotted against
observed water level on the logarithmic scale. Although it
is not very clear from Fig. 9a (and Fig. 9c), it is evident
from Fig. 9b (and Fig. 9d) that the model error increases
with higher water levels, as expected. This confirms the het-
eroscedasticity of model residuals.
Normality of model residuals for the Llanyblodwel catch-
ment for all lead times was investigated (see Fig. 10, top
left). Visual inspection of probability plots, superimposed on
which the line joining the 25th and 75th percentiles of the fit-
ted normal distributions, reveals that errors are not normally
distributed, i.e. the data do not fall on the straight line, as is
especially the case for the tails. It should be realized that the
departure from normality increases with longer lead times.
The top right plot in Fig. 10 compares the two selected dis-
tributions (normal distribution and t location-scale distribu-
tion) for model residuals during training. It can be concluded
that neither the normal distribution nor the t location-scale
distribution provides a good fit to the data.
Furthermore, a normality check for model residuals’ dis-
tribution is made individually for the data clusters cor-
responding to particular hydrometeorological conditions.
The variables used for clustering are groundwater level
(GW), soil moisture deficit (SMD), and observed water level
(Hobs, t−1). It is seen that the level of achieving normality
in model residuals’ distribution for each cluster is substan-
www.hydrol-earth-syst-sci.net/19/3181/2015/ Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015
3192 N. Dogulu et al.: Estimation of predictive hydrologic uncertainty
Figure 8. Standard deviation of model error during calibration and validation (Upper Severn catchments).
Figure 9. Observed water level, forecasted water level and model residuals during calibration and validation (Llanyblodwel, lead time 6 h).
tially poorer if compared to the Brue catchment. This can
be explained by the fact that the error time series data being
analysed have a time step of 12 h which is long enough to
hinder the effects of varying water levels on error. Another
reason can be related to the nature of model residuals, e.g.
forecasted precipitation is used to predict water levels. This
brings a great amount of uncertainty and a higher difference
between the actual and the predicted water levels (i.e. higher
model residuals). It is also worth mentioning that the distri-
bution closest to normal is found in the data cluster repre-
senting high groundwater levels, high water levels, and low
soil moisture deficit (Cluster 2, comprising 4.6 % of the total
data set) (Fig. 10, middle). Distributions of Clusters 1, 3, 4,
5, and 6 are far from normal.
Both the Brue and Llanyblodwel case studies indicate that
it is not possible to understand the origin of the model error
in uncertainty assessment by looking at the probability plots
of model residuals for each cluster. However, it is worth-
while mentioning that it is mostly the extreme events which
make the overall distribution non-Gaussian. Classifying data
so that different hydrometeorological conditions (most im-
portantly, the extreme events) are separated helps to achieve
homogeneity, and thus normality in model residuals’ distri-
bution. Therefore clustering can be suggested as an alterna-
tive to transformation of model residuals before applying any
statistical methods to them.
Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015 www.hydrol-earth-syst-sci.net/19/3181/2015/
N. Dogulu et al.: Estimation of predictive hydrologic uncertainty 3193
Figure 10. Probability plots for model residuals (during training) for the Llanyblodwel catchment: comparison of fitted normal distributions
for all the lead times (top left); comparison of the two fitted distributions: normal vs. t location-scale distribution (top right) and the clusters
(lead time 6 h).
Figure 11. Comparison of prediction limits for the 90 % confidence level during validation: (a) for the highest peak event (16 December 1995,
04:00 UTC–28 December 1995, 16:00 UTC), and (b) for a medium peak event (6 January 1996, 00:00 UTC–18 January 1996, 12:00 UTC).
4.2 Uncertainty prediction by QR and UNEEC
Uncertainty analysis results from both methods are evaluated
and compared employing the validation measures explained
in Sect. 2.2.
4.2.1 Brue catchment
Validation measures PICP, MPI, and ARIL are provided in
Table 2. In terms of PICP, even though QR provides PICP
values slightly closer to 90 and 50 % during training, UN-
EEC was found to be more reliable in validation, especially
for the 90 % CL. While the narrowest prediction interval on
average is given by UNEEC during training for both 90 and
50 % CL, comparable MPI values are obtained during valida-
tion. QR has smaller ARIL values, particularly for the 90 %
CL. However, on aggregate UNEEC yields better results over
QR, especially in validation.
Looking at Fig. 11a, visual analysis of 90 % prediction in-
tervals for the highest flow period in validation reveals that
neither UNEEC nor QR is perfectly able to enclose the ob-
servations of high flows. Overall, in validation, the analysis
results from UNEEC and QR are comparable for the highest
peak event (Table 2). For medium peaks in validation, how-
ever, QR produces wider uncertainty bounds in comparison
to UNEEC. This is illustrated in Fig. 11b. For this medium
peak event it should be noted that the higher MPI (and ARIL)
www.hydrol-earth-syst-sci.net/19/3181/2015/ Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015
3194 N. Dogulu et al.: Estimation of predictive hydrologic uncertainty
Table 2. Uncertainty analysis results for 90 and 50 % confidence levels (Brue catchment).
Confidence level PICP (%) MPI (m−3 s−1) ARIL (–)
UNEEC QR UNEEC QR UNEEC QR
TR90 % 91.19 90.00 1.58 1.69 1.86 1.47
50 % 51.28 50.01 0.54 0.58 0.55 0.46
VD90 % 88.29 82.33 1.37 1.39 2.35 1.83
50 % 30.29 32.75 0.45 0.47 0.67 0.57
VD (highest peak event)90 % 57.14 62.79 2.86 3.47 0.66 0.78
50 % 27.91 30.90 1.06 1.28 0.24 0.27
VD (medium peak event)90 % 88.04 87.04 2.36 2.75 0.51 0.61
50 % 55.81 50.50 0.90 1.00 0.20 0.22
TR: training; VD: validation.
Table 3. PICP, MPI, and ARIL values for each cluster (training, 90 % confidence level, Brue): UNEEC vs. QR.
Cluster no. Number of data UNEEC QR
PICP (%) MPI (m3 s−1) ARIL (–) PICP (%) MPI (m3 s−1) ARIL (–)
1a 5447 (62.3 %) 92.12 1.14 2.67 88.16 0.88 1.96
2 787 (9.0 %) 82.08 2.98 0.50 84.5 3.51 0.57
3 2167 (24.7 %) 94.46 1.44 0.53 96.72 1.94 0.71
4b 83 (0.95 %) 74.70 7.55 0.33 90.36 12.00 0.49
5 266 (3.05 %) 77.44 5.96 0.48 89.47 7.58 0.58
a Low flows, low rainfall.b High flows, high rainfall.
value by QR is not manifested in PICP – both methods have
very close PICP values (Table 2). One of the reasons for this
may relate to the fact that by design UNEEC uses more pre-
dictors that explain the (past) catchment behaviour and hence
is able to “memorize” catchment behaviour better, and this is
especially pronounced during the longer periods of medium
flows rather than during high flows having shorter duration.
We have also compared performance of QR and UNEEC
for each cluster found by UNEEC during training. Unlike
for the whole data set (which is highly heterogeneous due
to extremes in rainfall–runoff processes), analysis for each
individual cluster focuses on more homogeneous data sets.
Table 3 shows the corresponding PICP, MPI and ARIL. In
general, it is difficult to decide which method is better – re-
sults are mixed. However, there is one observation that can be
made. For most clusters there is a dependency between PICP
and MPI: typically the higher MPI corresponds to PICP be-
ing closer to the CL (90 %). This may be explained by the
fact that for narrow MPIs PICP would be under “pressure”
and be lower (however, it would be difficult to generalize).
For example, for the high flow cluster (Cluster 4), QR ap-
pears to be better in terms of PICP, whereas UNEEC ends up
with very narrow MPI, and this is probably the reason why
its PICP could not reach 90 % CL.
The reported comparison was done for the clusters found
by UNEEC during training. In principle, a similar compar-
ison can also be made for the homogeneous groups of data
in the validation set; however, this may not have much sense
since this set imitates the model in operation, and in opera-
tion all models are run for individual input vectors at each
time step of the model run, and not for the whole set of data
(so the “validation set” in operation will never exist).
4.2.2 Upper Severn catchments: Yeaton, Llanyblodwel,
and Llanerfyl
For these catchments, in order to reflect performance for dif-
ferent lead times better, we are using the graphical represen-
tation of results.
Figure 12 shows the PICP values plotted against the MPI
for the calibration and validation periods. The most impor-
tant general conclusion is that both methods show excellent
results in terms of PICP for 90 % CL. For the 50 % CL the
results seem to be worse, especially for UNEEC – but the
reader should take into account that for the low lead times
the hydrological models are very accurate; hence MPI is ex-
tremely narrow (especially for 50 % CL) and it is no surprise
that PICP cannot be accurately calculated. Furthermore, for
the 90 % CL, the following can be said: for Yeaton, QR does
Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015 www.hydrol-earth-syst-sci.net/19/3181/2015/
N. Dogulu et al.: Estimation of predictive hydrologic uncertainty 3195
Figure 12. Comparison of UNEEC and QR based on both PICP and MPI during calibration period (7 March 2007, 08:00 UTC–7 March
2010, 08:00 UTC) and validation period (7 March 2010, 20:00 UTC–7 March 2013, 08:00 UTC) for the 90 % and 50 % confidence levels.
(The size of the marker represents the lead time; i.e. the bigger the marker, the longer the lead time.)
Figure 13. MPI (left) and ARIL (right) values obtained during calibration period (7 March 2007, 08:00 UTC–7 March 2010, 08:00 UTC)
and validation period (7 March 2010, 20:00 UTC–7 March 2013, 08:00 UTC) for the 90 % confidence level.
slightly better than UNEEC; for Llanyblodwel, both methods
are equally good; for Llanerfyl, the UNEEC method is a bit
better than QR.
For the further analysis, Fig. 13 presents MPI and ARIL
values for the 90 % CL on calibration and validation data
sets. It can be seen that with the increase in the lead time, the
forecast error obviously increases, and the values of both in-
dicators follow. In view of the (high) model accuracy, the rel-
atively low MPI values in the Yeaton catchment are not sur-
prising for both methods. Overall, the results are mixed: for
some catchments, QR is marginally better; for other catch-
ments, UNEEC has the higher performance.
www.hydrol-earth-syst-sci.net/19/3181/2015/ Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015
3196 N. Dogulu et al.: Estimation of predictive hydrologic uncertainty
Figure 14. Comparison of prediction limits for the 90% confidence level during validation (1 April 2012–7 March 2013): Yeaton, lead time
3 h (top); Llanyblodwel, lead time 6 h (middle); and Llanerfyl, lead time 12 h (bottom).
Figure 15. Comparison of prediction limits for the falling limb part of the hydrographs (medium water levels) for the 90 % confidence level
during validation: (a) Yeaton, lead time 3 h; (b) Llanyblodwel, lead time 6 h; and (c) Llanerfyl, lead time 12 h.
For the further comparison of estimated prediction limits
through uncertainty plots, three cases are selected based on
the relationship between basin lag time and lead time. These
cases are (1) Yeaton, lead time 3 h (lead time < basin lag
time), (2) Llanyblodwel, lead time 6 h (lead time ≈ basin lag
time), and (3) Llanerfyl, lead time 12 h (lead time > basin lag
time). The fundamental idea here is to understand how well
the residual uncertainty is assessed with regard to forecast
lead time and its relation to basin lag time. The catchment
with the longest basin lag time (Yeaton) is considered for
Case 1, where the effect of a very short lead time is to be in-
vestigated. Here in this decision, there is the deliberate inten-
tion to combine the condition of having more accurate model
outputs (i.e. extremely small residuals) as well. Case 3, on
the other hand, is important to understand lead time-basin lag
time relationship for the worst situation: relatively poor qual-
ity of the forecasting model and the longest lead time. This
is the most critical case since the performance of predictive
uncertainty method’s performance has a bigger role in the
operational decision-making process. Apart from these two
extreme cases, Case 2 represents a balanced situation where
the lead time of interest and basin lag time are approximately
Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015 www.hydrol-earth-syst-sci.net/19/3181/2015/
N. Dogulu et al.: Estimation of predictive hydrologic uncertainty 3197
Table 4. PICP, MPI, and ARIL values for MEDIUM water levels (validation, 90 % confidence level): UNEEC vs. QR.
Catchment Water level Number of data UNEEC QR
PICP (%) MPI (m) ARIL (–) PICP (%) MPI (m) ARIL (–)
Yeaton 0.3–0.6 m 281 (13 %) 82.56 0.0212 0.054 86.48 0.0299 0.074
Llanyblodwel 0.5–0.8 m 540 (25 %) 89.63 0.1377 0.223 93.52 0.1680 0.269
Llanerfyl 1.3–1.6 m 570 (26.5 %) 84.91 0.4156 0.297 85.09 0.5572 0.398
equal. The Llanyblodwel catchment is chosen for this case as
its model has a moderate predictive accuracy. Fig. 14 com-
pares the computed prediction limits by QR and UNEEC for
these cases during the latest 11-month period of validation
(April 2012–February 2013). It was during late 2012 that the
Upper Severn catchment suffered from serious flooding, and
this period corresponds to the right half of the plots. The most
salient observations from Fig. 14 are as follows:
– In Llanerfyl, one can notice a strange behaviour of the
model causing sharp changes in forecasted water lev-
els (unstable model outputs), and thus in prediction
limits. Considering that the Llanerfyl catchment has a
basin lag time of ∼ 3–5 h, hydrological conditions in
the catchment, e.g. water levels, can change signifi-
cantly in 12 h (1t , time step of the data set). Therefore,
it is not surprising that the sharpest changes occur in
this catchment’s hydrograph as compared to Yeaton and
Llanyblodwel. One can observe even more significant
changes in the second half period of the hydrograph. It
is necessary to mention that these oscillating changes
appear as a consequence of the forecasting model’s ex-
tremely poor performance.
– For the low water levels in Yeaton and Llanyblodwel,
UNEEC gives wider prediction intervals as compared
to QR. A possible explanation for this can be encap-
sulation of groundwater level information in UNEEC.
Groundwater levels remain at higher levels for longer
periods than water levels in the river (i.e. due to slow
and long response times of groundwater levels to chang-
ing hydrometeorological conditions). Thus, using GW
as an input variable in its non-linear model, UNEEC has
the potential to provide an uncertainty band of larger
widths for water levels when the groundwater level is
high.
– For the medium water levels in Yeaton and Llany-
bldowel, QR gives wider prediction intervals as com-
pared to UNEEC, which is confirmed by the higher
MPI and ARIL (without any significant improvements
in PICP) values for QR (Table 4) obtained for medium
water levels. This is particularly true on the falling limb
part of the hydrographs as exemplified in Fig. 15a and b
(for Yeaton and Llanyblodwel, respectively). The aver-
age of the MPI values corresponding to three examples
shown from Yeaton and Llanyblodwel, respectively, are
0.0204 and 0.0201 m for UNEEC, whereas for QR it is
0.0418 and 0.0295 m.
– For peak water levels in the Yeaton and Llanyblodwel
catchments, it is mostly QR that produces a higher up-
per prediction limit than UNEEC. Yet, this does not con-
tribute to the overall performance of the method signifi-
cantly. On the contrary, it is seen in some cases that such
high upper prediction limits make the uncertainty band
unnecessarily wide.
– Continuous peaks prevail in the Llanerfyl catchment (as
its basin lag time is far shorter than the forecast lead
time of interest). Such continuous peaks occur during
certain periods in the Llanyblodwel catchment too. In
most of these cases, UNEEC gives narrower uncertainty
band, and wider prediction interval computed by QR
is redundant. That is to say, it does not contribute QR
method’s performance (as measured by PICP) at all in
terms of its ability to enclose more observations within
the band. For peak water levels, however, QR is slightly
more informative than UNEEC.
– Noticeably, upper prediction limits obtained by QR in
the Llanerfyl catchment for the long-lasting falling limb
part of the hydrograph (indicated by arrows in Fig. 14c)
are too high, e.g. even greater than those provided by
UNEEC. QR (in this study, by design) is a method
building simple linear regression models considering
only observed water levels on forecasted water levels.
Having a rather simple mathematical formulation, it
might be that the sensitivity of the computed upper pre-
diction limit to the magnitude of water level increases,
and shows an amplifying effect on uncertainty band
width.
Table 5 shows the values of validation measures (PICP,
MPI, and ARIL) for each cluster (obtained during training)
for the Llanyblodwel catchment (lead time 6 h). For flood
management the cluster 2 (4.6 % of all data) – with the high
groundwater levels, and hence potentially corresponding to
flood conditions – could be the most interesting one. In UN-
EEC, the highest MPI value was obtained for this cluster
with a relatively bad PICP value compared to the other clus-
ters. Similar to UNEEC, the largest MPI was obtained for
www.hydrol-earth-syst-sci.net/19/3181/2015/ Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015
3198 N. Dogulu et al.: Estimation of predictive hydrologic uncertainty
Table 5. PICP, MPI, and ARIL values for each cluster (training, 90 % confidence level, Llanyblodwel, lead time 6 h): UNEEC vs. QR.
Cluster no. Number of data UNEEC QR
PICP (%) MPI (m) ARIL (–) PICP (%) MPI (m) ARIL (–)
1 413 (19.1 %) 88.62 0.1492 0.271 93.95 0.1506 0.250
2a 100 (4.6 %) 85.00 0.2964 0.288 95.00 0.3538 0.326
3 336 (15.5 %) 90.18 0.1798 0.249 94.94 0.2283 0.287
4b 359 (16.6 %) 93.04 0.0518 0.182 89.14 0.0305 0.100
5 535 (24.8 %) 89.53 0.1128 0.308 85.79 0.0742 0.179
6 416 (19.2 %) 90.38 0.0920 0.212 92.31 0.1021 0.208
a High groundwater levels;b low groundwater levels.
this cluster with the QR method also. Both methods provide
equally bad PICP values. Giving a wider uncertainty band
than UNEEC on average, QR is less capable of estimating
reasonable prediction limits for very high groundwater lev-
els. This is also supported by its greater (12 %) ARIL value
compared to UNEEC.
PICP and MPI values for Cluster 4 should be mentioned
as well. This cluster represents the situations with the very
low water levels, very low groundwater levels, and very high
soil moisture deficit, and constitutes 16.6 % of all the data.
In comparison to UNEEC, QR provides a PICP value very
close to 90 % CL despite its slightly lower MPI. Thus, one
can say that UNEEC fails in providing reliable uncertainty
estimates for the extreme condition associated with very low
water and groundwater levels. This can be due to the effect
of using state variables as predictors. All in all, the state vari-
ables are calculated by the model and they cannot reflect real
catchment conditions accurately, especially when the (hydro-
logical) model is not very accurate. That is particularly true
for the extreme events, considering that models mostly fail in
simulating such events.
Overall, UNEEC is worse than QR on for one cluster but
better or equal on all other clusters; however, in general, both
methods in terms of PICP show reasonably good results.
5 Conclusions and recommendations
This study should be seen as accompanying the study by
López López et al. (2014) (and earlier work on UNEEC and
QR) and presents a comparative evaluation of uncertainty
analysis and prediction results from QR and UNEEC meth-
ods on the four catchments that vary in hydrological char-
acteristics and the models used: Brue catchment (simulation
mode) and Upper Severn catchments – Yeaton, Llanyblod-
wel, and Llanerfyl (forecasting mode). The latter set of case
studies is important from a practical perspective in that the
effect of lead time on uncertainty analysis results and its re-
lation to basin lag time is demonstrated. For both QR and
UNEEC different model configurations than their previous
applications are considered. One of reasons to compare these
two methods was to understand if a simpler linear method
(QR) using fewer input data performs well compared to a
more complex (non-linear) method (UNEEC) with more pre-
dictors. The following conclusions can be drawn from the
results of this study:
– In terms of easiness of set-up (data preparation and cal-
ibration), preference should be given to QR simply be-
cause it is a simpler linear method with one input vari-
able (in this study), whereas UNEEC has more steps and
requires more data analysis. However, the model set-up
is carried out only once, and in operation both methods
can be easily used and both have very low running times
(a fraction of a second on a standard PC) since they are
based on algebraic calculations.
– In almost all case studies both methods adequately rep-
resent residual uncertainty and provide similar results
consistent with the understanding of the hydrological
picture of the catchment and the accuracy of the (hydro-
logical) models used. We can recommend both methods
for use in flood forecasting.
– In one case study, Llanerfyl, we found that UNEEC was
giving more adequate estimates than QR. This catch-
ment has a shorter basin lag time and the model out-
puts for this catchment were characterized by a rela-
tively high error, so our conclusion was that probably
in such a rapid response catchment the UNEEC’s more
sophisticated non-linear models were able to capture re-
lationships between the hydrometeorological and state
variables, and the quantiles better than the QR’s linear
model.
– A useful finding is that inclusion of a variable repre-
senting groundwater level (GW) as a predictor in UN-
EEC improves its performance for the Upper Severn
catchments. This can be explained by the fact that this
variable has a high level of information content about
the state of a catchment. However, it should be noted
that, in other catchments, using such information can
be misleading due to slow (and long) response times
Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015 www.hydrol-earth-syst-sci.net/19/3181/2015/
N. Dogulu et al.: Estimation of predictive hydrologic uncertainty 3199
of groundwater levels to changing hydrometeorological
conditions. Yet, overall, it can be advised to make use of
variables which can be representative of the hydrolog-
ical response behaviour of a catchment for improving
the predictive capacity of data-driven methods.
There are limitations of the presented research (aspects that
have not been taken into account in this paper due to the time
and project setting constraints), which can also be seen as
recommendations for the future research.
We recommend comparing the two presented methods
(QR and UNEEC) with more predictive uncertainty methods
which use different methodologies, such as HUP (Krzyszto-
fowicz, 1999), the more recent MCP (Todini, 2008) and
DUMBRAE (Pianosi and Raso, 2012). Yet another recom-
mendation (induced by the referee’s and Editor’s sugges-
tions) is to extend the list of the possible performance mea-
sures and to test the applicability of the methods developed
for the assessment of the probabilistic forecast quality (Laio
and Tamea, 2007) whose mathematical apparatus is trans-
ferrable to the problem of residual uncertainty prediction.
It can also be recommended to test capabilities of differ-
ent predictive uncertainty methods on theoretical cases with
the known distributions, as well as on the catchments of dis-
tinct hydrologic behaviour, with diverse climatic conditions,
and having various hydrological features. In this study, we
found that the basin lag time is a notable characteristic of a
catchment having great influence on uncertainty analysis re-
sults (as measured by PICP and MPI). When the lag time is
longer, the catchment memorizes more information regard-
ing its hydrological response characteristics.
On the other hand, exploring the performance of different
methods on similar catchments (Sawicz et al., 2011; Toth,
2013; Patil and Stieglitz, 2011; Sivakumar et al., 2014) and
finding bases for generalized guidelines on the selection of
the most appropriate predictive uncertainty method in opera-
tional flood forecasting practices is also important and could
be considered in the further studies as well.
When different predictive uncertainty methods are eval-
uated based on their comparative performance, it is more
important to have validation measures incorporating certain
aspects of rainfall–runoff processes, i.e. varying flow condi-
tions. For example, the accuracy of the hydrological model
decreases during high flow events, and thus the amount of
residual uncertainty increases. This necessitates exploring
validation measures linking the prediction interval to the (hy-
drological) model quality, e.g. by employing the weighted
mean prediction interval (Dogulu et al., 2014).
There are other possibilities for further improvements in
both of the presented methods. For example, the different
configurations of QR, the alternative clustering techniques
for UNEEC, as well as using it in instance-based learning
(e.g. locally weighted regression) as the predicting model can
be explored further.
Acknowledgements. The results presented in this paper were ob-
tained during the thesis research of the European Erasmus Mundus
Flood Risk Management Master students Nilay Dogulu and Patricia
López López. The Environment Agency is gratefully acknowledged
for providing the data for the Upper Severn catchments in the UK.
We are very grateful to the two anonymous reviewers and the HESS
Editor, E. Todini, for very useful comments and suggestions.
Edited by: E. Todini
References
Aubert, D., Loumagne, C., and Oudin, L.: Sequential assimila-
tion of soil moisture and streamflow data in a conceptual rain-
fallrunoff model, J. Hydrol., 280, 145–161, doi:10.1016/S0022-
1694(03)00229-4a, 2003.
Bailey, R. and Dobson, C.: Forecasting for floods in the Severn
catchment, J. Inst. Water Eng. Sci., 35, 168–178, 1981.
Barnwal, P. and Kotani, K.: Climatic impacts across agricultural
crop yield distributions: An application of quantile regression on
rice crops in Andhra Pradesh, India, Ecol. Econ., 87, 95–109,
2013.
Battiti, R.: Using mutual information for selecting features in super-
vised neural net learning, IEEE T. Neural Networ., 5, 537–550,
1994.
Baur, D., Saisana, M., and Schulze, N.: Modelling the effects of
meteorological variables on ozone concentration – a quantile re-
gression approach, Atmos. Environ., 38, 4689–4699, 2004.
Bell, V. A. and Moore, R. J.: The sensitivity of catchment runoff
models to rainfall data at different spatial scales, Hydrol. Earth
Syst. Sci., 4, 653–667, doi:10.5194/hess-4-653-2000, 2000.
Bensaid, A. M., Hall, L. O., Bezdek, J. C., Clarke, L. P., Silbiger, M.
L., Arrington, J. A., and Murtagh, R. F.: Validity-guided (re) clus-
tering with applications to image segmentation, IEEE T. Fuzzy
Syst., 4, 112–123, 1996.
Bergström, S.: Development and application of a conceptual runoff
model for Scandinavian catchments, report RHO7, 118, Swed.
Meteorol. Hydrol. Inst., Norrköping, Sweden, 1976.
Beven, K. and Binley, A.: The future of distributed models: Model
calibration and uncertainty prediction. Hydrol. Process., 6, 279–
298, doi:10.1002/hyp.3360060305, 1992.
Bogner, K. and Pappenberger, F.: Multiscale error analy-
sis, correction, and predictive uncertainty estimation in a
flood forecasting system, Water Resour. Res., 47, W07524,
doi:10.1029/2010WR009137, 2011.
Bondell, H. D., Reich, B. J., and Wang, H.: Noncrossing Quan-
tile Regression curve estimation, Biometrika, 97, 825–838,
doi:10.1093/biomet/asq048, 2010.
Box, G. E. P., Jenkins, G. M., and Reinsel, G. C.: Time series anal-
ysis: forecasting and control (4th ed.), p. 16, John Wiley & Sons,
Inc., Hoboken, New Jersey, 2008.
Bremnes, J. B.: Probabilistic wind power forecasts using local
quantile regression, Wind Energy, 7, 47–54, doi:10.1002/we.107,
2004.
Brown, J. D. and Heuvelink, G. B. M.: Assessing uncertainty propa-
gation through physically based models of soil water flow and so-
lute transport, in: Encyclopedia of Hydrological Sciences, 6:79,
edited by: Anderson, M. G., John Wiley, New York, 2005.
www.hydrol-earth-syst-sci.net/19/3181/2015/ Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015
3200 N. Dogulu et al.: Estimation of predictive hydrologic uncertainty
Cannon, A. J.: Quantile regression neural networks: Implementa-
tion in R and application to precipitation downscaling, Comput.
Geosci., 37, 1277–1284, 2011.
Coccia, G. and Todini, E.: Recent developments in predictive
uncertainty assessment based on the model conditional pro-
cessor approach, Hydrol. Earth Syst. Sci., 15, 3253–3274,
doi:10.5194/hess-15-3253-2011, 2011.
Coxon, G., Freer, J., Westerberg, I., Woods, R., Smith, P., and Wa-
gener, T.: A generalised framework for large-scale evaluation of
discharge uncertainties across England and Wales, EGU General
Assembly, Vienna, Austria, 27 April–2 May 2014, EGU2014-
10157, 2014.
Di Baldassarre, G. and Montanari, A.: Uncertainty in river discharge
observations: a quantitative analysis, Hydrol. Earth Syst. Sci., 13,
913–921, doi:10.5194/hess-13-913-2009, 2009.
Dogulu, N., Solomatine, D. P., and Shrestha, D. L.: Applying clus-
tering approach in predictive uncertainty estimation: a case study
with the UNEEC method, EGU General Assembly, Vienna, Aus-
tria, 27 April–2 May 2014, EGU2014-5992, 2014.
EA: Environment Agency: River levels: Midlands, available at:
http://www.environment-agency.gov.uk/homeandleisure/floods/
riverlevels/, (last access: 1 October 2013), 2009.
Friederichs, P. and Hense, A.: Statistical Downscaling of Extreme
Precipitation Events Using Censored Quantile Regression, Mon.
Weather Rev., 135, 2365–2378, doi:10.1175/MWR3403.1, 2007.
Gupta, H. V., Beven, K. J., and Wagener, T.: Model calibration and
uncertainty estimation, in: Encyclopedia of Hydrological Sci-
ences, 11:131, edited by: Anderson, M. G., John Wiley, New
York, 2005.
Halkidi, M., Batistakis, Y., and Vazirgiannis, M.: On cluster-
ing validation techniques, J. Intell. Inf. Syst., 17, 107–145,
doi:10.1023/A:1012801612483, 2001.
Hill, T. and Neal, C.: Spatial and temporal variation in pH, alkalinity
and conductivity in surface runoff and groundwater for the Upper
River Severn catchment. Hydrol. Earth Syst. Sci., 1, 697–715,
doi:10.5194/hess-1-697-1997, 1997.
Jin, X., Xu, C. Y., Zhang, Q., and Singh, V. P.: Parameter and mod-
eling uncertainty simulated by GLUE and a formal Bayesian
method for a conceptual hydrological model, J. Hydrol., 383,
147–155, doi:10.1016/j.jhydrol.2009.12.028, 2010.
Kobold, M. and Sušelj, K.: Precipitation forecasts and their uncer-
tainty as input into hydrological models, Hydrol. Earth Syst. Sci.,
9, 322–332, doi:10.5194/hess-9-322-2005, 2005.
Koenker, R.: Quantile Regression, Cambridge University Press,
New York, 2005.
Koenker, R. and Bassett Jr., G.: Regression Quantiles, Economet-
rica, 1, 33–50, 1978.
Koenker, R. and Hallock, K.: Quantile Regression, J. Econ. Per-
spect., 15, 143–156, 2001.
Krzysztofowicz, R.: Bayesian theory of probabilistic forecasting via
deterministic hydrologic model, Water Resour. Res., 35, 2739–
2750, doi:10.1029/1999WR900099, 1999.
Krzysztofowicz, R.: The case for probabilistic forecasting in hy-
drology, J. Hydrol., 249, 2–9, 2001.
Krzysztofowicz, R. and Kelly, K. S.: Hydrologic uncertainty proces-
sor for probabilistic river stage forecasting, Water Resour. Res.,
36, 3265–3277, doi:10.1029/2000WR900108, 2000.
Kudryavtsev, A. A.: Using quantile regression for
rate-making, Insur. Math. Econ., 45, 296–304,
doi:10.1016/j.insmatheco.2009.07.010, 2009.
Laio, F. and Tamea, S.: Verification tools for probabilistic forecasts
of continuous hydrological variables, Hydrol. Earth Syst. Sci.,
11, 1267–1277, doi:10.5194/hess-11-1267-2007, 2007.
Lee, H., Seo, D.-J., and Koren, V.: Assimilation of streamflow
and in-situ soil moisture data into operational distributed hy-
drologic models: Effects of uncertainties in the data and initial
model soil moisture states, Adv. Water Resour., 34, 1597–1615,
doi:10.1016/j.advwatres.2011.08.012, 2011.
Lindström, G., Johansson, B., Persson, M., Gardelin, M., and
Bergström, S.: Development and test of the distributed
HBV-96 hydrological model, J. Hydrol., 201, 272–288,
doi:10.1016/S0022-1694(97)00041-3, 1997.
Liu, Y. and Gupta, H. V.: Uncertainty in hydrologic modeling: To-
ward an integrated data assimilation framework, Water Resour.
Res., 43, W07401, doi:10.1029/2006WR005756, 2007.
López López, P., Verkade, J. S., Weerts, A. H., and Solomatine, D.
P.: Alternative configurations of Quantile Regression for estimat-
ing predictive uncertainty in water level forecasts for the Upper
Severn River: a comparison, Hydrol. Earth Syst. Sci., 18, 3411–
3428, doi:10.5194/hess-18-3411-2014, 2014.
Marsh, T. and Hannaford, J.: UK Hydrometric Register, Hydrologi-
cal Data UK Series, Centre for Ecology and Hydrology, Walling-
ford, UK, 1–210, 2008.
Møller, J. K., Nielsen, H. A., and Madsen, H.: Time-adaptive quan-
tile regression, Comput. Stat. Data An., 52, 1292–1303, 2008.
Montanari, A.: What do we mean by ’uncertainty’? The need for
a consistent wording about uncertainty assessment in hydrology,
Hydrol. Process., 21, 841–845, 2007.
Montanari, A. and Brath, A.: A stochastic approach for assess-
ing the uncertainty of rainfall–runoff simulations, Water Resour.
Res., 40, W01106, doi:10.1029/2003WR002540, 2004.
Moore, R. J., Jones, D. A., Cox, D. R., and Isham, V. S.: Design
of the HYREX raingauge network, Hydrol. Earth Syst. Sci., 4,
523–530, doi:10.5194/hess-4-523-2000, 2000.
Mukolwe, M. M., Di Baldassarre, G., Werner, M., and Solomatine,
D. P.: Flood modelling: parameterisation and inflow uncertainty,
P. I. Civil Eng.-Wat. M., 167, 51–60, 2014.
Munir, S., Chen, H., and Ropkins, K.: Modelling the impact of road
traffic on ground level ozone concentration using a quantile re-
gression approach, Atmos. Environ., 60, 283–291, 2012.
Nasseri, M. and Zahraie, B.: Application of simple clustering on
space-time mapping of mean monthly rainfall pattern, Int. J. Cli-
matol., 31, 732–741, doi:10.1002/joc.2109, 2011.
Nasseri, M., Zahraie, B., Ansari, A., and Solomatine, D. P.: Un-
certainty assessment of monthly water balance models based on
Incremental Modified Fuzzy Extension Principle method, J. Hy-
droinform., 15, 1340–1360, doi:10.2166/hydro.2013.159, 2013.
Nielsen, H. A., Madsen, H., and Nielsen, T. S.: Using Quantile Re-
gression to extend an existing wind power forecasting system
with probabilistic forecasts, Wind Energy, 9, 95–108, 2006.
Pappenberger, F. and Beven, K. J.: Ignorance is bliss: Or seven
reasons not to use uncertainty analysis, Water Resour. Res., 42,
W05302, doi:10.1029/2005WR004820, 2006.
Patil, S. and Stieglitz, M.: Hydrologic similarity among catchments
under variable flow conditions, Hydrol. Earth Syst. Sci., 15, 989–
997, doi:10.5194/hess-15-989-2011, 2011.
Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015 www.hydrol-earth-syst-sci.net/19/3181/2015/
N. Dogulu et al.: Estimation of predictive hydrologic uncertainty 3201
Pianosi, F. and Raso, L.: Dynamic modeling of predictive uncer-
tainty by regression on absolute errors, Water Resour. Res., 48,
W03516, doi:10.1029/2011WR010603, 2012.
Pianosi, F., Shrestha, D. L., and Solomatine, D. P.: ANN-based
representation of parametric and residual uncertainty of models,
IEEE IJCNN, 1–6, doi:10.1109/IJCNN.2010.5596852, 2010.
Quinlan, J. R.: Learning with continuous classes, in: Proceedings
of the 5th Australian joint Conference on Artificial Intelligence,
Hobart, Tasmania, 16–18 November 1992, 343–348, 1992.
Reggiani, P. and Weerts, A.: A Bayesian approach to decision-
making under uncertainty: An application to real-time
forecasting in the river Rhine, J. Hydrol., 356, 56–59,
doi:10.1016/j.jhydrol.2008.03.027, 2008.
Reggiani, P., Renner, M., Weerts, A., and Van Gelder, P.: Uncer-
tainty assessment via Bayesian revision of ensemble streamflow
predictions in the operational river Rhine forecasting system,
Water Resour. Res., 45, W02428, doi:10.1029/2007WR006758,
2009.
Roscoe, K. L., Weerts, A. H., and Schroevers, M.: Estimation of the
uncertainty in water level forecasts at ungauged river locations
using Quantile Regression, Int. J. River Basin Manage., 10, 383–
394, 2012.
Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. A., and Carrillo,
G.: Catchment classification: empirical analysis of hydrologic
similarity based on catchment function in the eastern USA, Hy-
drol. Earth Syst. Sci., 15, 2895–2911, doi:10.5194/hess-15-2895-
2011, 2011.
Shrestha, D. L. and Solomatine, D. P.: Machine learning approaches
for estimation of prediction interval for the model output, Neu-
ral Networks, 19, 225–235, doi:10.1016/j.neunet.2006.01.012,
2006.
Shrestha, D. L. and Solomatine, D. P.: Data-driven approaches for
estimating uncertainty in rainfall-runoff modelling, Int. J. River
Basin Manage., 6, 109–122, 2008.
Shrestha, D. L., Rodriguez, J., Price, R. K., and Solomatine, D. P.:
Assessing model prediction limits using fuzzy clustering and ma-
chine learning, Proc. 7th Int. Conf. On Hydroinformatics, 4–8
September, Nice, France, 2006.
Shrestha, D. L., Robertson, D. E., Wang, Q. J., Pagano, T. C., and
Hapuarachchi, H. A. P.: Evaluation of numerical weather pre-
diction model precipitation forecasts for short-term streamflow
forecasting purpose, Hydrol. Earth Syst. Sci., 17, 1913–1931,
doi:10.5194/hess-17-1913-2013, 2013.
Sikorska, A. E., Scheidegger, A., Banasik, K., and Rieckermann, J.:
Considering rating curve uncertainty in water level predictions,
Hydrol. Earth Syst. Sci., 17, 4415–4427, doi:10.5194/hess-17-
4415-2013, 2013.
Singh, S. K., McMillan, H., and Bárdossy, A.: Use of the data depth
function to differentiate between case of interpolation and extrap-
olation in hydrological model prediction, J. Hydrol., 477, 213–
228, doi:10.1016/j.jhydrol.2012.11.034, 2013.
Sivakumar, B., Singh, V. P., Berndtsson, R., and Khan, S. K.:
Catchment Classification Framework in Hydrology: Challenges
and Directions, J. Hydrol. Eng., doi:10.1061/(ASCE)HE.1943-
5584.0000837, 2015.
Solomatine, D. P.: Modelling Theory and Uncertainty (An Intro-
duction) – Part 2: Analysis of model uncertainty, Lecture Notes.
UNESCO-IHE Institute for Water Education, Delft, the Nether-
lands, 2013.
Solomatine, D. P. and Dulal, K. N.: Model trees as an alternative to
neural networks in rainfall–runoff modelling, Hydrol. Sci. J., 48,
399–411, 2003.
Solomatine, D. P. and Shrestha, D. L.: A novel method to estimate
model uncertainty using machine learning techniques, Water Re-
sour. Res., 45, W00B11, doi:10.1029/2008WR006839, 2009.
Solomatine, D. P. and Wagener, T.: Hydrological Modeling, in:
Treatise on Water Science, vol. 2, edited by: Wilderer, P., Aca-
demic Press, Oxford, 435–457, 2011.
Solomatine, D. P., Dibike, Y. B., and Kukuric, N.: Automatic cal-
ibration of groundwater models using global optimization tech-
niques, Hydrol. Sci. J., 44, 879–894, 1999.
Tayfur, G., Zucco, G., Brocca, L., and Moramarco, T.: Coupling
soil moisture and precipitation observations for predicting hourly
runoff at small catchment scale, J. Hydrol., 510, 363–371,
doi:10.1016/j.jhydrol.2013.12.045, 2014.
Taylor, J. W.: Forecasting daily supermarket sales using exponen-
tially weighted quantile regression, Eur. J. Oper. Res., 178, 154–
167, 2007.
Todini, E.: A model conditional processor to assess predictive un-
certainty in flood forecasting, Int. J. River Basin Manage., 6,
123–137, 2008.
Toth, E.: Catchment classification based on characterisation of
streamflow and precipitation time series, Hydrol. Earth Syst. Sci.,
17, 1149–1159, doi:10.5194/hess-17-1149-2013, 2013.
van Andel, S. J., Weerts, A., Schaake, J., and Bogner,
K.: Post-processing hydrological ensemble predictions in-
tercomparison experiment, Hydrol. Process., 27, 158–161,
doi:10.1002/hyp.9595, 2013.
Verkade, J. S. and Werner, M. G. F.: Estimating the benefits of
single value and probability forecasting for flood warning, Hy-
drol. Earth Syst. Sci., 15, 3751–3765, doi:10.5194/hess-15-3751-
2011, 2011.
Wagener, T. and Gupta, H. V.: Model identification for hydrological
forecasting under uncertainty, Stoch. Environ. Res. Risk A., 19,
378–387, doi:10.1007/s00477-005-0006-5, 2005.
Wallingford 1994: Wallingford Water, a food forecasting and warn-
ing system for the river Soar, Wallingford Water, Wallingford,
UK, 1994.
Wallingford 1997: HR Wallingford, ISIS software, available at:
http://www.isisuser.com/isis/ (last access: 1 October 2013), HR
Wallingford, Hydraluic Unit, Wallingford, UK, 1997.
Weerts, A. H., Winsemius, H. C., and Verkade, J. S.: Estima-
tion of predictive hydrological uncertainty using quantile re-
gression: examples from the National Flood Forecasting Sys-
tem (England and Wales), Hydrol. Earth Syst. Sci., 15, 255–265,
doi:10.5194/hess-15-255-2011, 2011.
Werner, M., Schellekens, J., Gijsbers, P., van Dijk, M., van
den Akker, O., and Heynert, K.: The Delft-FEWS flow
forecasting system, Environ. Modell. Softw., 40, 65–77, 30
doi:10.1016/j.envsoft.2012.07.010, 2013.
Xie, X. L. and Beni, G.: A validity measure for fuzzy clustering,
IEEE T. Pattern. Anal., 13, 841–847, 1991.
www.hydrol-earth-syst-sci.net/19/3181/2015/ Hydrol. Earth Syst. Sci., 19, 3181–3201, 2015