Post on 08-Feb-2023
An evaluation of the impact of model structure on hydrological
modelling uncertainty for streamflow simulation
Michael B. Buttsa,*, Jeffrey T. Paynea,1, Michael Kristensenb, Henrik Madsena
aRiver and Flood Management Department, Water Resources Division, DHI Water and Environment,
Agern Alle 11, DK 2970 Hørsholm, DenmarkbHydrology, Soil and Waste Department, Water Resources Division, DHI Water and Environment,
Agern Alle 11, DK 2970 Hørsholm, Denmark
Received 11 August 2003; revised 21 January 2004; accepted 29 March 2004
Abstract
Operational flood management and warning requires the delivery of timely and accurate forecasts. The use of distributed and
physically based forecasting models can provide improved streamflow forecasts. However, for operational modelling there is a
trade-off between the complexity of the model descriptions necessary to represent the catchment processes, the accuracy and
representativeness of the input data available for forecasting and the accuracy required to achieve reliable, operational flood
management and warning. Four sources of uncertainty occur in deterministic flow modelling; random or systematic errors in the
model inputs or boundary condition data, random or systematic errors in the recorded output data, uncertainty due to
sub-optimal parameter values and errors due to incomplete or biased model structure. While many studies have addressed the
issues of sub-optimal parameter estimation, parameter uncertainty and model calibration very few have examined the impact of
model structure error and complexity on model performance and modelling uncertainty. In this study a general hydrological
framework is described that allows the selection of different model structures within the same modelling tool. Using this tool a
systematic investigation is carried out to determine the performance of different model structures for the DMIP study Blue
River catchment using a split sample evaluation procedure. This investigation addresses two questions. First, different model
structures are expected to perform differently, but is there a trade-off between model complexity and predictive ability?
Secondly, how does the magnitude of model structure uncertainty compare to the other sources of uncertainty? The relative
performance of different acceptable model structures is evaluated as a representation of structural uncertainty and compared to
estimates of the uncertainty arising from measurement uncertainty, parametric uncertainty and the rainfall input. The results
show first that model performance is strongly dependent on model structure. Distributed routing and to a lesser extent
distributed rainfall were found to be the dominant processes controlling simulation accuracy in the Blue River basin.
Secondly that the sensitivity to variations in acceptable model structure are of the same magnitude as uncertainties arising from
the other evaluated sources. This suggests that for practical hydrological predictions there are important benefits in exploring
different model structures as part of the overall modelling approach. Furthermore the model structural uncertainty should be
considered in assessing model uncertainties. Finally our results show that combinations of several model structures can be a
means of improving hydrological simulations.
q 2004 Elsevier B.V. All rights reserved.
Journal of Hydrology 298 (2004) 242–266
www.elsevier.com/locate/jhydrol
0022-1694/$ - see front matter q 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.jhydrol.2004.03.042
1 Present address: Natural Heritage Institute, 926 J Street, Suite 612, Sacramento, CA 95814, USA.
* Corresponding author. Tel.: þ45-451-69272; fax: þ45-451-69200.
E-mail address: mib@dhi.dk (M.B. Butts).
Keywords: Model structure uncertainty; Distributed hydrological modelling; Automatic calibration; Hydrological simulation uncertainty; Flow
forecasting
1. Introduction
Beven (2000) breaks down the development of a
hydrological model into the following steps:
1. The Perceptual model: deciding on the processes
2. The Conceptual model: deciding on the equations
3. The Procedural model: developing the model code
4. Model calibration: getting values of parameters
5. Model validation: confirming applicability and
accuracy
In many practical applications, the considerations
in steps 1 and 2 lead to the selection of an appropriate
existing model code Refsgaard (1996). While there
exist many hydrological modelling tools (for a
comprehensive overview see Singh, 1995; Beven,
2000) in most cases both the perceptual model
(determining the dominant flow mechanisms and the
most significant processes to be described) and the
conceptual model (the mathematical description of
these processes) is fixed. The selection of specific
perceptual and conceptual models determines the
model structure. A change to either the perceptual or
conceptual model then usually requires the selection
of another model code. Indeed the model structure is
often used as a way of classifying and distinguishing
different hydrological models (Singh, 1995). This is
also reflected in the DMIP study itself, where different
modelling tools having different structures are used in
the intercomparison (Reed et al., 2004).
There are several factors that argue for the value of
a modelling tool that allows changes in the model
structure. First, in many cases it is sufficient to only
treat the dominant flow processes while other
processes can be ignored. Secondly, as the
understanding of the hydrologist grows and evolves
either as a result of new information or as direct
feedback from the modelling exercise, his perception
of the important processes may change or evolve. In
some cases this may be a deliberate strategy where the
initial model structure used is quite simple and the
structure is adapted to improve the simulations made
by the model until acceptable results are achieved
(Atkinson et al., 2003; Farmer et al., 2003).
Thirdly, the ability to represent a particular process
in the model is often determined by the data available
either to parameterise the process description or
calibrate and validate the model simulations for that
process. As new data become available or are applied,
the model structure may require revision.
More importantly, different applications require
different levels of complexity to represent the same
hydrological system. For example a river model
developed for a flood design problem may need to
be much more detailed than a flood forecasting model
developed for flood warning in the same river.
Finally the ability to develop and apply new process
descriptions within the same modelling framework
provides a powerful research tool.
In a number of modelling systems it is possible to
exclude specific processes. In other tools changes in
either the model structure or the model code have lead
to variants of a particular tool (e.g. TOPMODEL,
Beven, 1997; TOPKAPI, Ciarapica and Todini, 2002;
TOPNET, Reed et al., 2004). However, as far as the
authors are aware there appears to be very few
hydrological modelling tools that provide
comprehensive facilities for modifying the model
structure and very few studies that have investigated
the effects of model structure in the context of
modelling performance and modelling uncertainty.
Koren et al. (2003) describe a research modelling
framework where it is possible to change the process
descriptions within a grid-based framework to
evaluate their accuracy in hydrological simulation
and forecasting. Other authors emphasise that it is
straightforward to include and use other process
descriptions within their models (e.g. Calver and
Wood, 1995), however, there appears to be relatively
little documentation of the systematic application of
such an approach. One of the few studies that have
carried out a systematic intercomparison of different
model structures is documented in the paper of
Refsgaard and Knudsen (1996). Three different
models embodying three quite different model
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 243
structures and degrees of spatial distribution are
compared using a systematic calibration and
validation procedure. Nevertheless, only relatively
few model structures are explored. Farmer et al.
(2003) and Atkinson et al. (2002, 2003) apply models
of increasing complexity to investigate the trade-off
between model complexity and prediction accuracy
and to determine the physical controls on flow
prediction. The different model structures investigated
were conceptual rainfall-runoff models based on one
or more simple storages where increasing complexity
is obtained by adding thresholds, more storages, etc.
These can be viewed as simplifications of a more
general conceptual model. The idea of using several
different models (model structures) for ensemble
simulations is well-established in weather and climate
prediction (e.g. World Meteorological Organisation
(WMO), 2001) but is relatively unexplored in the
hydrological literature, (Georgakakos et al., 2004).
In this study a hydrological modelling framework
is developed that permits changes in the model
structure, including both conceptual and physic-based
process descriptions, to be made within the same
modelling tool. This tool is used to evaluate
the performance of different model structures and
the impact of variations in model structure on the
uncertainty of hydrological simulations for one of the
DMIP basins, the Blue River basin in Oklahoma.
An evaluation of the different model structures is
carried out against a number of performance criteria
using split sample testing. The different simulations
provided by the different model structures provide an
ensemble of model simulations rather than a single
valued simulation. For these calibrated models,
the resulting variations in model simulations can be
interpreted as resulting from the uncertainty in model
structure. These variations are compared to estimates
of the impact of uncertainty in the rainfall input data
and uncertainty in the model parameters for the same
catchment. As pointed out by Georgakakos et al.
(2004), there are several studies that have examined
the influence of parametric and input uncertainty in
ensemble flow forecasting, however, few studies of
model structure uncertainties have been reported in
the hydrological literature. In the context of the DMIP
study, there is some evidence that an ensemble of
model simulations using different model structures
should be considered for operational forecasting
(Georgakakos et al., 2004). This new framework
allows the generation of such ensembles within the
same tool and this idea is further explored here.
2. The modelling framework
A general hydrological modelling tool has been
developed that allows the different model structures to
be applied within the same modelling framework.
This modelling framework tool integrates the current
capabilities of two existing tools, MIKE 11 and MIKE
SHE, and extends these models by providing varying
levels of model complexity, spatial variability and
model structure for the different catchment and
channel processes.
The MIKE 11 modelling system provides
distributed routing and distributed rainfall-runoff
modelling by dividing the basin of interest into
sub-catchments linked to the river network to capture
the spatial variations in either the meteorological
forcing, sub-basin characteristics and channel routing.
The continuous simulation of the rainfall-runoff
process in each sub-catchment is carried out here
using the NAM conceptual model (Butts et al., 2001;
Madsen, 2000; Refsgaard, 1997; Havnø et al., 1995).
The runoff from these basins becomes distributed
inflow to the river or channel network.
MIKE SHE represents a further development of the
SHE modelling concept described in Abbott et al.
(1986a,b). Each of the main processes within the
hydrological cycle and their mutual interaction are
represented as process-orientated modules and their
interaction. The SHE model is in fact an implemen-
tation of the modelling paradigm proposed by Freeze
and Harlan (1969). In this original blueprint different
flow processes are described by the governing partial
differential equations and these are then solved by
discrete numerical approximations in space and time
(Refsgaard and Storm, 1995; Abbott and Refsgaard,
1996; Storm and Refsgaard, 1996). The spatial
variations in meteorological forcing and hydrological
characteristics are then represented on this finite
difference grid. Until recently this blueprint formed
the basis of the modelling approach used in MIKE SHE.
However, there are a number of important
limitations to the applicability of such models.
Firstly it is widely recognised that such models require
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266244
a significant amount of data and the cost of data
acquisition may be high (Calver and Wood, 1995).
Secondly the relative complexity of this model requires
substantial execution time. This may be of importance
for example in the case of flood forecasting where short
execution times are required to provide timely forecasts
or for evaluating the impact of climate changes where
several scenarios over extremely long time scales are
required. Their relative complexity may lead to over-
parameterised descriptions for simple applications such
as the simulation of discharges. Finally this type of
model attempts to represent flow processes at the grid
scale with mathematical descriptions that at best are
valid for small scale experimental conditions,
however, this is no guarantee for their validity at the
scale of the element grids used in hydrological models.
For example, there is considerable debate as to
whether the Darcian description of unsaturated flow is
valid at the plot scale (Flury et al., 1994; Hornberger
et al., 1991) and in some cases the laboratory scale
(Hollenbeck and Jensen, 1998; Mortensen et al., 2001)
let alone at the grid scale. The inherent heterogeneity of
natural soils means that any process description must
either ignore sub-grid variations and processes or
devise strategies to account for these. As there is very
little measurement information available at the grid
scale different strategies have been devised to derive
large scale or effective parameters from small scale
measurements (Refsgaard and Butts, 1999) so that the
flow description is effectively conceptual rather than
physics-based.
These limitations provide the motivation for the
inclusion of simpler conceptual models within the
same framework, to reduce execution time or to
address sub-grid processes for large scale modelling.
Similarly as pointed out in the Introduction there are
several advantages in providing modelling tools
where the process descriptions and modelling
structures can be adapted to the specific application.
Such a framework has been developed as part of the
new MIKE SHE/MIKE 11 modelling system.
3. Model structures
Variations in model structure are used here in the
context of simulation uncertainty. Four sources of
uncertainty occur in deterministic flow modelling;
† random or systematic errors in the model inputs
(boundary or initial conditions)
† random or systematic errors in the recorded output
data used to measure simulation accuracy
† uncertainties due to sub-optimal parameter values
† uncertainties due to incomplete or biased model
structure
Once the model structure is specified, the model
parameters need to be estimated or calibrated to
obtain the best simulation. While many studies have
addressed the issues of parameter estimation,
equifinality and model calibration (Beven and Freer,
2001; Madsen, 2003), very few have examined the
impact of model structure error and model complexity
on model performance.
Model structure includes a whole range of choices
and assumptions made by the modeller either
explicitly or implicitly in applying a hydrological
model. Examples of different model structures include
† different process descriptions
† different coupling of the processes
† different numerical discretisation
† different representations of the spatial variability-
zones, grids, sub-catchments, etc.
† different element scale and sub-grid process
representations including distribution functions,
different degrees of lumping, effective parameter-
isation, etc.
† different interpretations and classifications of soil
type, geology land use cover, vegetation, etc.
Different process descriptions can arise from a
number of ways. In the simplest case certain processes
are not important, for example groundwater flows in a
flood forecasting application, and can be omitted.
Alternatively a process may be modelled at different
degrees of approximation, for example channel flows
might be represented by a kinematic wave description
rather than a fully dynamic wave approximation of the
St Venant equations. Another possibility is to select
completely different mathematical representations of
the underlying process. For example using linear
reservoirs to represent sub-surface flow instead
of Darcy flow equations. Another important part of
the model structure is the definition of how
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 245
the different processes are coupled. Is there recharge
from the river into the surrounding aquifer or not?
The numerical discretisation of the mathematical
equations in space can be performed using finite
difference grids, finite element and finite volume, or
by using hydrological response units, zones and sub-
catchments, in the form of grids or polygons. The
grids used may be either one-, two- or three-
dimensional also giving rise to different model
structures. For a particular application the size of the
model elements may be varied to provide different
degrees of spatial resolution or to reduce computation
time. And in the case of large grids different degrees of
sub-grid parameterisation may be adopted. For
example a Richards equation solution to the unsatu-
rated flow (e.g. Storm and Refsgaard, 1996) may be
replaced by a VIC (Variable Infiltration Capacity)
approach (e.g. Lohman et al., 1998).
Especially for sub-surface processes the under-
lying geology is highly uncertain and subject to
interpretation. Therefore, different geological
interpretations can lead to different model structures.
The above list is by no means exhaustive and the
reader is referred to recent reviews of hydrological
modelling (Singh, 1995; Beven, 2000; Butts et al., 2002)
for more comprehensive overview of possible
modelling approaches and structures.
4. Methodology
4.1. Selecting model structures
For the purpose of this study the following
methodology was adopted. First an appropriate
model structure is selected. It was outside the limits
of this study to investigate all the possible model
structures. However, the structures used here cover a
wide range of the possible variations described in the
previous section that would be appropriate for
modelling the Blue River basin. The structures
selected here are all plausible alternatives for the
hydrological simulation of flood flows incorporating
the main processes occurring in the basin.
The complete list of the different model structures
investigated here and their main characteristics is
given in Table 1. A short explanation of the main
distinguishing features of each structure is given in
Table 2. In all cases the Blue River is modelled as a
single river channel (Fig. 1). The only exception being
Table 1
Matrix summary of model structures used in this study
ID Short name Processes Spatial distributions
Routing
equation
Unsaturated
zone
Bypass Drainage
flow
Groundwater Rainfall Parameters Elements
s1 Lumped Lumped Conceptual No No Conceptual Lumped Lumped Basin
s2 Distributed
Routing
Fully dynamic Conceptual No No Conceptual Sub-basin Lumped Sub-basin
s3 Muskingum Muskingum–Cunge Conceptual No No Conceptual Sub-basin Lumped Sub-basin
s4 Distributed
rainfall
Fully dynamic Conceptual No No Conceptual Sub-basin Lumped Sub-basin
s5 3 regions Fully dynamic Conceptual No No Conceptual Sub-basin 3 regions Sub-basin
g1 Aggregated
rainfall
Fully dynamic 1D gravity
drainage
No Yes 2D Darcy Flow Sub-basin 4 km grid Grid
g2 Gridded
rainfall
Fully dynamic 1D gravity
drainage
No Yes 2D Darcy flow 4 km grid 4 km grid Grid
g3 No drains Fully dynamic 1D gravity
drainage
No No 2D Darcy flow Sub-basin 4 km grid Grid
g4 Linear ,
reservoir
Fully dynamic 1D gravity
drainage
No Yes Conceptual Sub-basin 4 km
grid/sub-basin
Grid/sub-basin
g5 Bypass
infiltration
Fully dynamic 1D gravity
drainage
Yes Yes Conceptual Sub-basin 4 km
grid/sub-basin
Grid/sub-basin
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266246
(s1) where the routing is lumped rather than
distributed. The spatial data for rainfall-runoff
modelling is either treated as lumped for the entire
basin, or lumped across the sub-basins or distributed
across a uniform grid, Table 1. The sub-catchment
delineation used is shown in Fig. 1 and differs only
slightly from that used in other studies of the Blue
River (e.g. Boyle et al., 2001). The modelling grid is
identical to the NEXRAD 4 km grid (Fig. 1).
4.2. Calibration and validation
Once a particular model structure is selected the
resulting model is calibrated to the observed discharge
data for the Blue River basin. In the original DMIP
study the calibration period was defined from May
1993 to May 1999 followed by a short 13 month
validation period (Fig. 2). For the Blue River basin
very few significant flow peaks were found in the
validation period. Therefore, in this study an
alternative calibration and validation period were
chosen to more rigorously test the predictive ability of
the different model structures. A shorter calibration
period was chosen representing a wide range of flow
and several significant peaks. Two periods with a
similar number of significant peaks as the calibration
period were chosen for validation, Fig. 2. The period
of 3 years prior to the calibration period was used first
as a warm-up during the calibration. This period was
introduced to ensure the choice of initial conditions
did not affect the resulting calibrations. Then part of
this period was also used for validation (Fig. 2).
To calibrate the different model structures an
automatic multiple objective calibration method was
used. This method is based on the shuffled complex
evolution (SCE) method (Duan et al., 1992; Madsen,
2000, 2003) using bounds on the calibration
parameters. Automatic calibration together with split
sample testing permits an objective comparison of the
different model structures. In each case the same
calibration period and calibration objectives were
used. This approach still involved the subjective
choice of which parameters to include in the
calibration and what bounds to set on the parameter
range. Experience has shown that increasing
the number of degrees of freedom does not necessarily
increase the predictive ability of the model.
Model calibration becomes a curve fitting problem
when too many parameters are used and, therefore, the
simulations do not capture anymore the underlying
physical processes. This will often reduce predictive
power of the model. For the Blue River catchment,
the only calibration data available is the discharge at
the outlet. Therefore, the number of parameters used
for calibration was kept low, usually between four and
ten. In most cases an initial sensitivity study was
carried out on a larger set to reduce the number of
parameters used on calibration. In all cases bounds
were defined for each of the calibration parameters.
A description of the parameters used in the calibration
is given in Table 3. The number of parameters and
degrees of freedom used in the calibration of each
Table 2
Short description of model structures
ID Short name Description
s1 Lumped Completely lumped using the MIKE 11
NAM model concept
s2 Distributed routing Completely lumped with fully dynamic
distributed routing, using the MIKE 11
NAM model concept
s3 Muskingum Sub-basin distributed rainfall with
Muskingum–Cunge routing, using the
MIKE 11 NAM model concept
s4 Distributed rainfall Sub-basin distributed rainfall with fully
dynamic routing, using the MIKE 11
NAM model concept
s5 3 regions Sub-basin distributed rainfall with fully
dynamic routing, with 3 independent
parameter regions, using the MIKE 11
NAM model concept
g1 Aggregated rainfall Sub-basin distributed rainfall using the
MIKE SHE model concept
g2 Gridded rainfall 4 km NEXRAD gridded rainfall using
the MIKE SHE model concept
g3 No drains Sub-basin distributed rainfall using the
MIKE SHE model concept excluding
drain flow
g4 Linear reservoir Sub-basin distributed rainfall using the
MIKE SHE model concept for surface
and unsaturated soil processes and a
semi-distributed model for the
sub-surface processes
g5 Bypass infiltration Sub-basin distributed rainfall using the
MIKE SHE model concept for surface
and unsaturated soil processes with a
bypass conceptual model for rapid
infiltration in the unsaturated zone, and
a semi-distributed model for the
sub-surface processes
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 247
model structure are listed in Table 4. For grid-based
modelling the distribution of the soils for the Blue
River provided as part of the DMIP study were used
directly to distribute the soil properties. Standard
values for each soil type for the soil capillary and
hydraulic properties were used. During calibration
the overall conductivities were scaled by a single
factor keeping the relative values the same, Table 3.
Madsen (2000) compares the effect of using
different objective functions and combinations of
objectives on the calibration results and showed
significant trade-offs in using different calibration
Fig. 1. Spatial discretisation used in this study for the Blue River, OK. The figure on the left shows the 8 sub-catchments used in conceptual
modelling and the parameter regions used for calibration. The figure on the right shows the 4-km NEXRAD grid used for the grid-based modelling.
Fig. 2. The calibration and validation periods used for DMIP and this study for the Blue River basin. The location of the peak events used in the
DMIP study and the corresponding peak discharges are given.
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266248
criteria, Fi: The models in this study were calibrated
to give the best match to two criteria, the absolute
value of the average error lAEl and RMSE.
These represent the water balance error and the
match to the shape of the hydrographs, respectively,
and are often used in evaluating model performance
F1 ¼ lAEl ¼1
N
XNi¼1
ðSi 2 OiÞ
���������� ð1Þ
F2 ¼ RMSE ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1
N
XNi¼1
ðSi 2 OiÞ2
vuut ð2Þ
Si is the simulated discharge at each time step and
Oi is the observed value. N is the total number of
values within the time period of analysis. For the
optimisation, the two criteria are aggregated into one
measure (Madsen, 2003)
Fagg ¼X2
i¼1
wigiðFiÞ ð3Þ
where wi are the weights and gi are transformation
functions assigned to each measure. The transform-
ation functions are applied to compensate for differ-
ences in magnitudes of the different measures so that
all gi have about the same influence on the aggregated
objective function near the optimum. The weighting
uses a transformation to a common distance scale
gi ¼Fi
si
þ 1i ð4Þ
where si is determined as the standard deviation of the
ith objective function from the initially randomly
generated sample in the SCE optimisation. Similarly,
the transformation constant 1i is determined from the
initial sample as
1i ¼ maxj¼1;2
minFj
sj
( )( )2 min
Fi
si
; i ¼ 1; 2 ð5Þ
where the minimum operator is taken over the initial
samples and the maximum operator is evaluated for
each objective. When using two objectives, the
optimisation procedures will not, in general, provide
a unique solution where both objectives are optimised.
A balanced optimum where both measures contribute
equally is obtained by using the same weight to the two
measures in Eq. (3). The general solution to the
optimisation problem will consist of the Pareto set
(or non-dominated solutions) due to trade-off between
the two measures. Since the SCE algorithm evolves a
population of parameter sets, the algorithm will
provide an approximation of the Pareto front near
Table 3
List of parameters used in the calibrations
Parameter Units Description
Umax mm Maximum water content in
surface storage
Lmax mm Maximum water content in root
zone storage
CQOF % Overland runoff coefficient
CKIF h Time constant for interflow
CK1.2 h Time constant for routing
overland flow
CKBF h Time constant for routing base
flow
TOF % Root zone threshold value for
overland flow
TIF % Root zone threshold value for
interflow
TG % Root zone threshold value for
groundwater recharge
Soil conductivity
factor
– Changes the global values for soil
conductivity by a factor
River leakage
time constant
1/s Coefficient governing leakage
into riverbed
Drain time constant 1/s Coefficient governing leakage
into drains
Drain depth m Elevation based threshold for
determining drain flow
Overland
Manning’s M
m1/3/s Global overland flow roughness
Overland storage m Global overland flow threshold
storage
Interflow threshold m Threshold for reservoir interflow
Interflow conductivity
(vert)
days Time constant governing
interflow to river
Interflow conductivity
(horiz)
days Time constant governing
interflow to aquifer
Groundwater
conductivity
days Time constant governing aquifer
flow to river
Groundwater
availability
% Ratio of aquifer storage available
to root-zone for
evapotranspiration
Bypass fraction % Ratio of net precipitation allowed
to bypass infiltration
Bypass threshold 1 % Threshold water content below
which the bypass fraction is
reduced
Bypass threshold 2 % Minimum water content at which
bypass occurs
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 249
Table 4
Parameters used for automatic calibration for each of the model structures
M.B
.B
utts
eta
l./
Jou
rna
lo
fH
ydro
log
y2
98
(20
04
)2
42
–2
66
25
0
the point in objective function space that corresponds
to the balanced optimum. The set of parameters
designated as ‘optimal’ according to the balanced
aggregated measure was selected and used to derive
simulations for the validation period.
The most important parameter in the SCE algorithm
is the number of complexes. Sensitivity tests by Duan
et al. (1994) show that the dimensionality of the
calibration problem is the primary factor determining
the proper choice of this parameter. In general, the
larger value is chosen the higher the probability of
converging into the global optimum but at the expense
of a larger number of model simulations. In practical
applications, one should choose the number of
complexes to balance the trade-off between the
robustness of the algorithm and the computing time.
The number of complexes used for calibration of the
different models is shown in Table 4. For the other
algorithmic parameters the recommended values by
Duan et al. (1994) were used.
5. Results
The results for the calibration and validation period
for each of the simulations are shown in Table 5.
This table lists the average error (AE), the percent
Bias (%B), the root mean square error (RMSE),
correlation coefficient ðRÞ and a flow-duration curve
error index (EI) (Refsgaard and Knudsen, 1996):
R¼NXN
i¼1SiOi 2
XN
i¼1Si
XN
i¼1Oiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
NP
S2i 2
XN
i¼1Si
� 2� �
NP
O2i 2
XN
i¼1Oi
� 2� �s
ð6Þ
EI ¼ 12
ÐlfoðqÞ2 fsðqÞldqÐ
foðqÞdqð7Þ
where foðqÞ is the flow duration curve based on the
observed hourly data and fsðqÞ is the flow duration
curve based on the simulated hourly data. EI
provides a measure of the difference between the
flow duration curves of the simulated and observed
hourly flows (perfect agreement for EI ¼ 1).
The RMSE and EI for the calibration and validation
periods are shown as a function of correlation in
Figs. 3 and 4, respectively.
Not surprisingly, the lumped conceptual rainfall-
runoff model with lumped routing (s1) performs
poorly, in both the calibration and validation period,
Table 5
Performance statistics for the model structures used in this study over calibration and validation periods
ID Short name Calibration period Validation Period
Avg. error
[AE] (m3/s)
Bias
[%B]
RMSE
(m3/s)
EI Correlation
[R ]
Avg. error
[AE] (m3/s)
Bias [%B] RMSE
(m3/s)
EI Correlation
[R ]
s1 Lumped 25.40 243.22 16.12 0.76 0.90 215.64 274.03 15.41 0.59 0.81
s2 Distributed
routing
0.01 0.07 11.86 0.96 0.92 25.21 224.68 14.47 0.85 0.73
s3 Muskingum 0.24 1.93 11.18 0.91 0.93 25.86 227.74 12.71 0.87 0.81
s4 Distributed
rainfall
20.07 20.58 11.65 0.94 0.92 26.18 229.27 13.32 0.87 0.79
s5 3 regions 20.08 20.61 10.79 0.95 0.93 26.23 229.49 13.46 0.88 0.78
g1 Aggregated
rainfall
0.00 0.02 12.86 0.97 0.91 22.54 212.05 12.45 0.91 0.81
g2 Gridded
rainfall
0.01 0.04 13.10 0.97 0.91 22.62 212.39 12.41 0.91 0.81
g3 No drains 0.00 0.01 13.95 0.90 0.91 23.88 218.34 13.78 0.93 0.78
g4 Linear
reservoir
0.14 1.09 16.47 0.88 0.84 0.56 2.65 16.67 0.88 0.66
g5 Bypass
infiltration
0.89 7.10 14.46 0.87 0.78 0.07 0.35 14.65 0.88 0.74
The statistics are defined in the text.
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 251
when compared to the distributed routing used in (s2).
The long narrow shape of the Blue River basin makes
distributed routing important. Indeed, distributed
routing gives the most significant improvement and
is one of the dominant processes in this catchment.
Similarly the use of distributed rainfall over the
8 sub-catchments (s3, s4, s5) further improves model
performance in both the calibration and validation
periods. These observations confirm similar results for
the Blue River presented previously by Boyle et al.
(2001) using another semi-distributed modelling
approach. They conclude that improvements in
model performance are demonstrably related to the
spatial distribution of model input and streamflow
routing although they do not use the split sample test
methodology used here. It is interesting that the
simple Muskingum–Cunge routing (s3) outperforms
the fully dynamic routing (s4) where both use the same
Fig. 3. Root means square error (RMSE) and correlation ðRÞ for the
calibration and validation period used in this study.Fig. 4. Flow duration error index (EI) and correlation ðRÞ for the
calibration and validation period used in this study.
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266252
conceptual rainfall-runoff model structure. However,
no calibration of the routing parameters was carried
out here. In both cases, a Manning M ¼ 30 [m1/3/s]
(Manning n ¼ 1=M ¼ 0:0333) was selected and used
to derive the routing parameters, so these differences
may simply arise from the selection of this value and
this should be explored further.
In the validation period the completely lumped
model has in fact the highest correlation but also the
largest bias and RMSE errors. Best performance in
the validation period is achieved by the (s3) using the
Muskingum–Cunge routing and (g1 and g2) using
fully dynamic routing and the grid-based model
structure. The structures (g1) and (g2) use the same
gridded model structure with spatially distributed soil
properties but different spatial distributions of rainfall.
In (g1) the NEXRAD rainfall is lumped or aggregated
over the sub-catchments defined in Fig. 1, while (g2)
uses the individual values of the NEXRAD grid
rainfall. It appears that using fully distributed rainfall
data does not improve the simulation accuracy in
either the calibration or validation period. The fact
that the additional information about the spatial
variability of the rainfall did not provide any
additional benefit is interesting. These results are, of
course, specific to this model structure and the Blue
River basin and this may well be a limitation of the
model structure itself. For example while the soil
properties are spatially distributed, the calibration was
lumped by using a single calibration factor for all soil
conductivities. Variations between individual cells or
each map units were not permitted in the calibration
process. This was done to avoid over-parameterisation
but this may limit the model’s ability to utilise the grid
rainfall. Alternatively the spatial resolution of
the model grid may be too coarse to capture the
underlying variability in runoff processes, smoothing
out the response to distributed rainfall. It may also be
due to the fact that rainfall input is biased so that
model performance is limited, for example, by
under-estimation of the rainfall. On the other hand,
it may well be that for simulating the catchment
flows the spatial resolution achieved using the
sub-catchment distribution is sufficient to capture
the runoff behaviour given the shape of the catchment.
Boyle et al. (2001) found that the simulation of the
peaks for the Blue River is controlled more strongly
by the spatial representation of the routing than by
the spatial representation of the precipitation. They
also found, using a semi-distributed model that while
improvement in model performance was obtained
using 3 sub-catchments, no additional benefit
was gained by increasing the number of sub-catch-
ments to 8.
This discussion highlights the fact that there is no
straightforward relationship to determine the level of
spatial resolution with which to represent spatial
variability in order to obtain accurate simulations and
to take full advantage of distributed modelling using
distributed rainfall. This is particularly the case where
the only information available for calibration or
model testing is the discharge at the basin outlet.
One of the main advantages of a distributed modelling
approach is the ability to make predictions internally
within a catchment. However, where no information
is available concerning the internal states of the
catchment then satisfactory predictions can be made
using less detailed spatial representations of the
catchment processes including rainfall. Indeed it can
be argued that one should strive to use only the
simplest possible model representation if the purpose
of the model is only to represent flows at the outlet.
The best overall model structure in both the
calibration and validation period is the (s3) structure.
This is also one of the simplest structures and typical
of those used in many flood forecasting applications.
This discussion also highlights the fact that the
various components of model uncertainty are strongly
interlinked. Uncertainty in the model simulations is
dependent on uncertainty in the forcing terms,
the model parameters and the model structure.
The model calibration process attempts to minimise
simulation errors using parameter estimation con-
ditional on the observations and on the model
structure. Where the input, output, and model
structure are subject to both systematic and random
errors then this calibration process could result in
parameter choices that attempt to compensate for
these uncertainties or constrain the calibration
process. Assigning simulation error to a single source
is therefore difficult. Rajaram and Georgakakos
(1989) propose a general framework that breaks
down the errors into the different components
including model structure errors. They argue that if
all other errors are accounted for then where large
residual errors are detected these are related to
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 253
potential structural error. But as pointed out in the
same paper this cannot be confirmed unless some
other structure is proposed that performs better and
indeed could just as well arise from inadequate
representation of the forcing term errors, e.g. rainfall
or measurement error or limitations in the parameter
estimation process. The resulting simulation errors are
therefore a combination of these. Nevertheless, it is
interesting to estimate how large the model structure
errors can plausibly be in relation to the other sources.
Later in this paper the different calibrated model
structures examined here are used to estimate the
sensitivity of the Blue River simulations to variations
in model structure in an attempt to determine the
order of magnitude of model structure uncertainty.
These estimates are conditioned on the observations
via the calibration process.
The best performing model structure, in the
calibration period, is the (s5) model. This model is
the most complex conceptual representation using
spatially distributed catchment properties as well as
distributed routing and distributed rainfall.
Three parameter regions were used to define the
distributed model parameters (Fig. 1). However, in
the validation period the performance is not nearly as
good. One explanation is that there is no benefit to
model performance associated with the distribution of
surface characteristics. Boyle et al. (2001) proposed
this and suggested that the impact of soil and
vegetation parameters is averaged out by the time
the flow reaches the outlet. However, this structure
has the largest number of calibration parameters and
the poor validation suggests that this model is
over-parameterised. This highlights the fact that
such split sample testing is essential in examining
the performance of different model structures and the
benefits of different levels of resolution of the spatial
variability in the forcing, the parameters and the
model structure.
In the calibration period the simpler, semi-distrib-
uted conceptual rainfall-runoff models perform
consistently better than the more complex grid-
based formulations. However, in the validation,
where low flow periods are more predominant, there
is no clear pattern and quite different model structures
perform equally well (Fig. 3).
The EI statistics provide some indication of how
well the model captures the distribution of flows found
in the streamflow record (Fig. 4). The flow duration
curves for the different model structures, Fig. 5,
show that while the models appear to match the
observed flow distribution reasonably well in
the calibration period, there is a general tendency to
underestimate the larger, less frequent flows in the
calibration and especially the validation period.
There is also a general tendency to underestimate
the flows in the validation period where low flows
appear to be more likely. What is not clear is whether
this is a limitation of the structures used here or this
Fig. 5. Measured and simulated flow duration curves for the
calibration and validation periods.
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266254
Fig. 6. Examples of the model structure ensemble and the estimated measurement uncertainty for two major peaks, one in the calibration period
and one in the validation period.
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 255
reflects for example some underestimation or bias in
the rainfall input. In general the grid-based
model structures provide a better match to the flow
duration curves which suggests that these may be
better at representing the full range of flows rather
than just the peaks.
Perhaps the most interesting observation is that
quite different model structures perform almost
equally well in the validation period but as can be
seen from the simulated flood hydrographs,
particularly event 5 in Fig. 6, they predict quite
different catchment response to rainfall. In the same
manner that more than one set of model parameters
can be found to satisfactorily represent catchment
flows, it appears that more than model structure can
also be found to represent catchment flows. The fact
that the predicted hydrographs are quite different
would suggest that it should be feasible to identify the
structures that best capture the important hydrological
mechanisms occurring in the catchment but this is not
reflected in the performance assessment. We believe
this is primarily a result of the limited amount of
information in the outflow hydrograph alone but is
also dependent on the calibration objectives used.
Further information, such as internal gauging stations,
is required if our goal is to determine the important
hydrological mechanisms acting within a catchment
as examining only the discharge at the outlet may
mask these.
We would also argue that future investigations of
equifinality be extended to include both parameter
sets and model structures that provide equally
plausible simulations. Actually in his description of
equifinality, Beven (2000) refers to both different
parameter sets and different model structures.
However, few studies have considered different
model structures presumably because of the compu-
tational requirements but also because of the difficulty
of using different model structures from different
model tools.
6. Uncertainty in the discharge measurement
Even where model performance is comparable,
there appears to be quite a substantial variation in the
hydrographs produced by the different model
structures. This variation of the different structures
can be used as an estimate of the uncertainty in
determining the most appropriate model structure.
Assuming that each model structure is equally likely
the results can be treated in a probabilistic framework.
This approach is adopted in Georgakakos et al. (2004)
for the DMIP models. Fig. 6 shows the ensemble of
simulations obtained by using all the model structures
in this study for two significant events with similar
peak discharges, one in the calibration period and one
in the validation period. For comparison, the
ensembles are shown together with the uncertainty
in the discharge measurement. A reasonable estimate
of the uncertainty in measured discharge for normal
flows is about 10%. For standard stream gauging
methods World Meteorological Organisation (WMO)
(1994) estimate the measurement uncertainty of
gauged streamflows as 5% standard error at 95%
confidence interval. If a rating curve is used to
estimate flows then additional uncertainty may arise
from looped rating effects or poor resolution over
some intervals due to the lack of gauging. This is often
the case for very large flow events or flooding where
extrapolation of the rating curve is required or flow
outside the main channel occurs and, therefore,
larger uncertainties can be expected for peak events.
Therefore, 10% uncertainty at the 95% confidence
interval is used here.
The variation produced by the different model
structures is wider than the estimated measurement
uncertainty for the peaks but there are still variations
in the observed hydrographs that are not reproduced
by any of the ensemble members. This could still be
due to an inadequate or biased model structure but
may also result from the other sources of uncertainty.
It is interesting to compare this with DMIP ensemble
results shown by Georgakakos et al. (2004).
Comparing our Fig. 6 with their Fig. 2, the variations
appears to be similar in magnitude but the ensemble
variations amongst the DMIP models are slightly
larger. As shown by Georgakakos et al. combinations
of different ensemble members may outperform
hydrological simulations from the best single model.
One possible interpretation of this is that the different
models provide model structures to the combined
model that are not found in the individual models.
Stated another way, different model structures may
better represent different parts of catchment response
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266256
and therefore improve the overall performance when
combined. This appears plausible from Fig. 6.
It is of interest to compare the order of magnitude
of the model structure uncertainty with other sources
of uncertainty in deterministic hydrological model-
ling, parametric uncertainty and uncertainty in the
model input or boundary conditions.
7. Parametric uncertainty
The calibrated models derived here are ‘optimal’ in
the sense that they represent an acceptable trade-off
between absolute average error and RMSE. Beven
and co-workers (Beven, 2000) argue that the concept
of an optimum parameter set is ill-founded. In the case
of the Blue River where the only calibration data is
discharge at the outlet, then many parameter sets that
are acceptable may occur in quite different parts of the
parameter space. This will be the case, for example,
if the model is not sensitive with respect to one or
more parameters or two sub-sets of parameters can
provide the same response using different
mechanisms. Instead, the concept of equifinality is
introduced where many possible parameter sets
and model structures may provide similar simulations
of the catchment response.
The crucial point then is how to define acceptable
simulations. In this study the ‘optimal’ simulations are
a trade-off between objectives that measure the water
balance and hydrograph shape. The estimated Pareto
fronts obtained from the calibration of the different
model structures are shown in Fig. 7. These fronts
were defined from the sample of parameters
investigated during calibration of the balanced
aggregated objective function, i.e. they provide an
approximation of the Pareto front near the balanced
optimum.
In choosing an acceptance criteria, both these
criteria and the measurement uncertainty should be
considered. One possible estimate of an acceptable
deviation in RMSE is to take 10% (the estimated
uncertainty in the flow measurement) of the average
flow. In the calibration period the average flow
is 12 m3/s. For a selected model structure,
for illustration we select (s3), then all parameter sets
within 1.2 m3/s of the Pareto front constitute
acceptable parameter sets (Fig. 7). As the biases are
quite low then an upper bound of approximately
0.4 m3/s is used here to close this acceptance region.
Fig. 7. Pareto fronts derived from the calibration of the model structures used in this study. Parameter sets within the shaded region were used to
define the parametric uncertainty for the single, reference model structure s3. The location of the calibration optimum for each structure is
shown by the large symbol.
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 257
Parameter sets within this region are considered to be
equally acceptable for predicting streamflow.
Using this criterion two other related model structures
also fall in this acceptance region.
The location of the Pareto fronts along the RMSE
axis shows the sensitivity of simulation accuracy to
model structure for the calibration period. The size of
the acceptance region for a single model in this case is
small compared to the sensitivity with respect to the
model structure in an RMSE sense. From a practical
point of view, this figure suggests that there is as much
scope for improving simulation accuracy by exploring
different model structures as there is from
different parameter sets. However, in many practical
hydrological studies the effects of different model
structures are seldom evaluated.
A simple estimate of the parametric uncertainty for
the model structure (s3) can be obtained from the
generated parameter sets of the SCE optimisation.
A representative ensemble of the acceptable
parameter sets was selected by using a simple latin
hypercube sampling approach, dividing the
acceptance region into 25 areas and randomly
selecting a set generated during the SCE optimisation
within each region. Strictly speaking the resulting
variation produced by this ensemble is a measure of
model sensitivity, Fig. 8. This figure is generated by
taking the upper and lower bounds of the ensemble
as a measure of the parametric uncertainty.
The ensemble range is similar in magnitude to the
observation uncertainty. The variations caused by
parametric uncertainty for this structure appear to
principally affect the magnitude of the peaks.
The variations in model structure provide more varied
hydrological response including the shape and timing
of the peaks.
This argues strongly for extending the model
calibration process usually carried out in hydrological
simulations to not only address parameter choices
within a particular model structure but also to address
the choice of both model structure and corresponding
parameters. While automatic calibration methods
seldom extend to exploring different model structures
the new modelling framework described here
represents a first step in this direction. The split
sample testing used here could provide a general
methodology for evaluating the different model
structures in such a process. As pointed out above
the usefulness of this process in identifying the
important hydrological flow mechanisms will depend
on the availability of other calibration data than the
discharge at the basin outlet or a more detailed
analysis of parts of the catchment response.
8. Rainfall uncertainty
It is expected that for a well-calibrated
hydrological model that adequately represents the
important runoff processes within the catchment that
the major factor contributing to the uncertainty in the
predicted flows is the uncertainty in rainfall. This is
confirmed to a large extent by our own experience in
many practical hydrological modelling studies.
Several authors suggest that this is the most important
contribution to model uncertainty. For example,
Refsgaard et al. (1983) show, using a Kalman filter
version of the NAM model, that the variations due to
uncertainty in rainfall estimation are significantly
larger than the uncertainty due to parameter
variations. However, this conclusion depends strongly
on catchment size and response time, the model and
the assumptions made in representing the different
sources of uncertainty.
The uncertainty in the rainfall may arise from
instrument bias or error, inadequate spatial or
temporal resolution and in the case of forecasts,
the inherent chaotic nature of weather systems.
A detailed analysis of the uncertainties and biases
for the NEXRAD precipitation has been addressed
elsewhere (Smith et al., 2004) and beyond the scope
of this study. Nevertheless, for the purposes of this
paper it is of interest to estimate the order of
magnitude of the uncertainty in streamflow
simulations due to uncertainties in the precipitation
input and to compare this to the other sources of
uncertainty.
As a first approximation only sub-catchment
rainfall aggregated from the radar rainfall is
considered and the uncertainty is assumed spatially
independent from sub-catchment to sub-catchment
and independent in time for the hourly values used.
Within each sub-catchment the rainfall uncertainty is
assumed to have the following simple structure:
P0i ¼ Pi þ a; a [ Nð0;sÞ ð8Þ
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266258
Fig. 8. Examples of the model structure ensemble and parametric measurement uncertainty for two major peaks, one in the calibration period
and one in the validation period. The dark solid lines show the upper and lower bounds of the model parameter ensemble and the reference
model structure s3.
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 259
where Pi is the ith sub-catchment rainfall based on
radar observations, P0i is the perturbed rainfall for the
ith sub-catchment and a is a random error. The value
of a is selected from a normal distribution Nð0;sÞ;
with zero mean and standard deviation, s: The
standard deviation was assumed as a first approxi-
mation to be linearly related to the sub-catchment
rainfall, i.e. s ¼ RPi: For the purpose of estimating
the order of magnitude of the impact of rainfall
uncertainties, a value of R ¼ 0:5 was selected. For this
approximate uncertainty model, the uncertainty
becomes zero when no rainfall is recorded by the
radar. Where values of P0i below zero are generated
the rainfall is set to zero. For the parameters selected
here, this will affect only a few percent of the sample
distribution. The above uncertainty model structure
assumes the rainfall uncertainties are independent in
space and time.
To determine the propagation of uncertainty due to
uncertainties in the measured rainfall, the transform-
ation of the rainfall uncertainty into uncertainty in the
runoff is calculated using a Monte Carlo approach
using a total of 200 samples. Fig. 9 compares the
model structure ensemble with the 95% confidence
intervals for uncertainty in the simulated discharge
obtained from this Monte Carlo procedure for the two
flood events considered earlier. These confidence
intervals were estimated using the ‘optimal’ model
structure (s3). The s3 structure is shown as a dark line
between the two bounds. The Monte Carlo procedure
was carried out over the entire simulation period used
on the DMIP study (Fig. 2). It is assumed that the
uncertainty in the precipitation input is the only
source of uncertainty.
In this case it appears that a 50% relative standard
deviation in the precipitation estimate ðR ¼ 0:5Þ has
only a limited impact on the accuracy of the
hydrological simulation when compared to the flow
measurement uncertainty and the other sources of
uncertainty. These results are again specific to the
model structure (s3) for the Blue River basin.
Certainly it is insufficient to explain the deviations
we see between the observed and simulated flows.
It should be recognised that this figure shows the
sensitivity of the selected model structure to
uncertainty in the rainfall input, conditioned on the
observations by the calibration process. The uncer-
tainty model used here does not account for
the significant biases that appear particularly in
radar-based precipitation measurement and this is
probably the largest contributing factor. Seo et al.
(2003) suggests there are significant systematic and
event-to-event biases in the precipitation data, which
should result in volume biases in the modelled
discharges. Perhaps more importantly the temporal
and spatial correlations are neglected. A more
thorough characterisation of the radar rainfall
uncertainties, including their spatial and temporal
correlation for the Blue River is required to verify
whether the low sensitivity to rainfall error found here
realistically represents the catchment response.
9. Model structure ensembles
An original outcome of the DMIP study, described
in Georgakakos et al. (2004) is the application of
multimodel ensembles for hydrological simulation.
In particular their analysis of the DMIP models
showed that combinations of different models provide
more reliable hydrological simulations than simu-
lations obtained from the single best performing
models. This, however, was not tested using an
independent validation period. Here we derive in a
similar manner new models from different
combinations of the different model structures used
in this study. An investigation of this is shown, for the
different combinations of model structures, using 2,
3,…,10 different structures, in Fig. 10. The model
simulations are weighted equally in all combinations.
One could also weight the different models according
to goodness-of-fit as in the GLUE methodology. In the
calibration period, many of the combinations of model
structures perform as well or better than the best
single model. More interesting is that in the
validation period many of the combinations of 2, 3
and 4 models still provide improvements in simu-
lation accuracy when compared to the best single
model. We also found that the ensemble average of all
10 model structures performs better than any single
model. These results for the split sample validation
provide a strong confirmation of the conclusion of
Georgakakos et al. (2004) that multimodel
(multiple model structure) ensembles may provide
important benefits for hydrological simulation.
The improvement appears to arise from
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266260
Fig. 9. Examples of the model structure ensemble and the rainfall input uncertainty for two major peaks, one in the calibration period and one in
the validation period. The dark solid lines show the upper and lower bounds of the rainfall uncertainty ensemble for the reference model
structure s3.
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 261
Fig. 10. Root means square error (RMSE) and correlation ðRÞ for the calibration and validation period used in this study for model structure
ensembles. The performance of combinations of 2,3,4,5,6,7,8,9 and 10 model structures is compared to the performance of the individual
structures (1 s).
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266262
the differences in hydrological response of the
different model structures which when added together
provide a better match to the observed flows.
10. Discussion and conclusions
A modelling framework has been developed to
explore different hydrological model structures in
both the rainfall-runoff and channel routing
components of their hydrological models. This frame-
work was then used in a systematic analysis of the
performance of a number of different model structures
for a particular basin, the Blue River, a test basin in
the Distributed Model Intercomparison Project
(DMIP), organised by the Office of Hydrologic
Development of the US National Weather Service.
Split sample testing was carried out and a variety of
performance measures were applied to evaluate the
ability of the different structures to simulate
streamflow. An automatic multiple objective
calibration procedure was used to determine an
‘optimal’ parameter set and to define the Pareto fronts
for each structure. The absolute value of the average
error and RMSE were used as the calibration criteria.
These ‘optimal’ models for each model structure were
used to create model structure ensembles to evaluate
model structure uncertainty and to evaluate the
performance of combinations of different model
structures.
The main findings are:
1. There are large variations in the model
performance amongst the selected model structures
used in this study. For the case of the Blue River
Basin, the model performance appears to be quite
sensitive to model structure. Substantial
improvements in model performance were
obtained in the Blue River by evaluating different
model structures. For practical hydrological
predictions this suggests that there can be import-
ant benefits in exploring different model structures
as part of the modelling approach. The modelling
framework developed here allows the hydrologist
to adopt such an approach and evaluate the
performance of different structures before a final
selection is made. The split sample testing
methodology can be used for this evaluation.
2. Using the RMSE and correlation criteria, an
optimal model structure (s3) was found that
performed best in both the calibration and
validation periods. This is one of the simplest
distributed model structures used. However, some
of the more complex grid-based rainfall-runoff
model structures were better at matching the
distribution of flows and performed as well or
better in the validation period. Ten quite different
model structures were evaluated in this study but
there is considerable scope for evaluating
other model structures and obtaining further
improvements in accuracy. Ideally a general
model calibration procedure should include both
model structure and parameter adjustment.
3. The performance of the sub-catchment based
conceptual models (s1–s5) confirm the results
from earlier work by Boyle et al. (2001) for the
Blue River. It was found that distributed routing
and distributed rainfall information increase the
simulation accuracy and predictive capability of
the model. This is more strictly tested here using
the split sample validation. The distributed
routing gave the largest improvement of the two
in this long narrow catchment. The (s5) model,
which also includes spatially distributed catchment
parameters provided an excellent calibration but
was outperformed by other models in the
validation. This might suggest as found by Boyle
et al. (2001), that the spatial distribution of
parameters does not improve performance.
However, it is likely that this model is simply
over-parameterised. The relationship between
model performance and the level of spatial
distribution used requires further study.
Split sample validation should be used to ensure
strict evaluation of the results. This also suggests
that there is an optimal number of calibration
parameters. Below this optimum not all the
processes are captured properly, while above this
optimum there is a good fit to the calibration data
but poor predictive power.
4. In a number of cases it appears that there is a
trade-off, where increasing the model complexity
does not increase model performance. The fully
dynamic routing (s2) does not outperform the
simpler Muskingum–Cunge routing (s3) when
the same conceptual rainfall-runoff description is
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 263
used. The (s5) structure with the largest number of
parameters and the greatest level of spatial
distribution of the sub-catchment-based structures
performs best in the calibration period but not
nearly as well in the validation period. Finally the
additional information in each of the NEXRAD
rainfall cells used in (g2) appears to provide
minimal if any benefit in simulation accuracy when
compared to the same model structure using
NEXRAD rainfall aggregated over sub-basins
(g1). However, as pointed out in the discussion
the reasons why increased complexity does not
necessary lead to better simulations may be caused
by several limitations. These include limitations in
the model structure itself, limitations in the
information available in the calibration data,
limitations in the accuracy and representative of
rainfall, or limitations in the parameter estimation
procedures. Further exploration of these
limitations is needed to derive, for example,
maximum benefit from the radar rainfall.
5. With the exception perhaps of the completely
lumped structure, the model structures selected
here are all plausible choices for hydrological
modelling of the Blue River basin. This study used
this range of plausible model structures as an
estimate of uncertainty due to model structure.
The resulting ensemble of model structure
simulations shows significant variations in the
shape and timing of the flood peaks. Uncertainty in
the measurement data was estimated together with
parametric and rainfall input uncertainty.
The parametric uncertainty was estimated from
the sensitivity of the simulations for a particular
model structure to choices of parameter sets that
provide the same level of performance. The impact
of rainfall uncertainty was estimated using a Monte
Carlo approach based on somewhat crude
assumptions about the properties of the rainfall
uncertainty. It was found that the sensitivity of
streamflow simulations to variations in
acceptable model structure was at least as large
as uncertainties arising from parametric and
measurement uncertainty. The model simulations
appeared to be less sensitive to rainfall uncertainty,
however, this was examined for one model
structure only. Furthermore, since the results are
based on rather simple assumptions regarding
the rainfall uncertainty, neglecting biases, and
other influences, such as neglecting the uncertainty
in the evapotranspiration, it is recommended that
further investigation of the impacts of uncertainty
in the model boundary conditions be carried out.
6. Finally the performance of new models derived by
combining the results of two or more of the
different model structures was evaluated.
The results show that many combinations could
be found that performed as well as or better than
the best individual model structures. We found that
the ensemble average of all 10 model structures
performs better than any single model.
These results using split sample validation provide
a strong confirmation of the conclusion of
Georgakakos et al. (2004) that multimodel
(multiple model structure) ensembles may provide
important benefits for hydrological simulation.
One interpretation of this result is that the different
model structures capture different aspects of the
catchment response and, therefore, more aspects of
the catchment response are captured in the
combined model.
Overall this study suggests that exploring an
ensemble of model structures provides a useful
approach both in defining significant errors in the
hydrological simulations and to improve the overall
accuracy of the simulations. The study has developed
a framework that allows hydrologists to investigate
different model structures. An important goal for
future work is the extension of this framework to other
structures. A more challenging problem is devising a
strategy for selecting appropriate model structures for
particular applications, with particular data.
The approach adopted here is one way forward.
Further work is required to evaluate how the different
sources of uncertainty can be treated together both for
hydrological simulation and hydrological forecasting.
It is also recommended that further research is needed
to evaluate the performance of model structure
ensembles in an operational context.
Acknowledgements
The authors would like to thank Michael Smith and
his colleagues on the staff of Hydrology Lab, OHD,
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266264
NWS for their assistance throughout this study.
David Tarboton, University of Utah, USA and Ross
Woods, National Institute of Water Research,
New Zealand, provided the gridded NEXRAD data
to the authors. The authors would like to acknowledge
Jacob Larsen DHI Water and Environment for
his assistance with the original submission and
Jean-Philippe Drecourt DHI Water and Environment
for fruitful discussions concerning parametric
uncertainty. The additional support to the second
author provided through the University of Washing-
ton’s Valle Exchange and Scholarship Program is also
acknowledged and appreciated. The development of
the methodology for propagation of rainfall
uncertainty was carried out by Anne Katrine Falk
DHI Water and Environment with the support of the
EU 5th Framework Research Programme, FLOOD
RELIEF, contract EVK1-CT2002-00171, http://
projects.dhi.dk/floodrelief/.
References
Abbott, M.B., Refsgaard, J.C., 1996. Distributed Hydrological
Modelling, Kluwer, Dordrecht.
Abbot, M.B., Bathurst, J.C., Cunge, J.A., O’Connell, P.E.,
Rasmussen, J., 1986a. An introduction to the European
Hydrological System—Systeme Hydrologique Europeen,
SHE. 1 History and philosophy of a physically-based distributed
modelling system. Journal of Hydrology 87, 45–59.
Abbot, M.B., Bathurst, J.C., Cunge, J.A., O’Connell, P.E.,
Rasmussen, J., 1986b. An introduction to the European
Hydrological System—Systeme Hydrologique Europeen,
SHE. 2 Structure of a physically-based distributed modelling
system. Journal of Hydrology 87, 61–77.
Atkinson, S., Woods, R.A., Sivapalan, M., 2002. Climate and
landscape controls on water balance model complexity over
changing time scales. Water Resources Research 38(12),
1314doi: 10.1029/2002WR001487, pp. 50.1–50.17.
Atkinson, S., Sivapalan, M., Woods, R.A., Viney, N.R., 2003.
Dominant physical controls of hourly streamflow predictions
and an examination of the role of spatial variability: Mahurangi
catchment, New Zealand. Advances in Water Resources 26(3),
219–235.
Beven, K.J., 1997. Distributed Hydrological Modelling:
Applications of the TOPMODEL Concept, Wiley, Chichester.
Beven, K.J., 2000. Rainfall-Runoff Modelling: The Primer, Wiley,
England.
Beven, K.J., Freer, J., 2001. Equifinality, data assimilation and
uncertainty estimation in mechanistic modelling of complex
environmental systems using the GLUE methodology. Journal of
Hydrology 249, 11–29.
Boyle, D.P., Gupta, H.V., Sorooshian, S., Koren, V., Zhang, Z.,
Smith, M., 2001. Toward improved streamflow forecasts: value
of semidistributed modeling. Water Resources Research 37(11),
2749–2759.
Butts, M.B., Klinting, A., van Kalken, T., Cadman, D., Fenn, C.,
Høst-Madsen, J., 2001. Design and development of an Internet-
based flood forecasting system using real-time rainfall, radar,
and river flow data. In: Falconer, R.A., Blain, W.R. (Eds.),
Proceedings of River Basin Management 2001, Cardiff, WIT
Press, pp. 139–148.
Butts, M.B., Hoest-Madsen, J., Refsgaard, J.C., 2002. Hydrologic
Forecasting, Encyclopaedia of Physical Science and Technol-
ogy, Third ed.
Calver, A., Wood, W.L., 1995. The Institute of Hydrology
distributed model. In: Singh, V.P., (Ed.), Computer Models of
Watershed Hydrology, Water Resources Publications, Color-
ado, USA, pp. 595–626.
Ciarapica, L., Todini, E., 2002. TOPKAPI: a model for the
representation of the rainfall-runoff process at different scales.
Hydrological Processes 16, 207–229.
Duan, Q., Sorooshian, S., Gupta, V., 1992. Effective and efficient
global optimization for conceptual rainfall-runoff models.
Water Resources Research 28(4), 1015–1031.
Duan, Q., Sorooshian, S., Gupta, V.K., 1994. Optimal use of the
SCE-UA global optimization method for calibrating watershed
models. Journal of Hydrology 158, 265–284.
Farmer, D., Sivapalan, M., Jothityangkoon, C., 2003. Climate, soil
and vegetation controls upon the variability of water balance in
temperate and semi-arid landscapes: downward approach to
hydrological prediction. Water Resources Research 39(2),
1035doi: 10.1029/2001WR000328.
Flury, M., Fluhler, H., Jury, W.A., Leuenberger, J., 1994.
Susceptibility of soils to preferential flow of water: a field
study. Water Resources Research 30, 1945–1954.
Freeze, R.A., Harlan, R.L., 1969. Blueprint for a physically-based
digitally-simulated hydrological response model. Journal of
Hydrology 9, 237–258.
Georgakakos, K.P., Seo, D.-J., Gupta, H., Schaake, J., Butts,
M.B., 2004. Characterising streamflow simulation uncertainty
through multimodel ensembles. Journal of Hydrology,
298(1–4), 222–241.
Havnø, K., Madsen, M.N., Dørge, J., 1995. MIKE 11—a
generalized river modelling package. In: Singh, V.P., (Ed.),
Computer Models of Watershed Hydrology, Water Resources
Publications, Colorado, USA, pp. 733–782.
Hollenbeck, K.J., Jensen, K.H., 1998. Experimental evidence of
randomness and non-uniqueness in unsaturated outflow
experiments designed for hydraulic parameter estimation.
Water Resources Research 34(4), 595–602.
Hornberger, G.M., Germann, P.F., Beven, K.J., 1991. Throughflow
and solute transport in an isloated sloping soil block in a forested
catchment. Journal of Hydrology 124, 81–99.
Koren, V.I., Reed, S., Smith, M., Zhang, Z., Seo, D.-J., 2003.
Hydrology Laboratory Research Modeling System (HL-RMS)
of the National Weather Service. Journal of Hydrology in
review.
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266 265
Lohmann, D., Raschke, E., Nijssen, B., Lettenmaier, D.P., 1998.
Regional scale hydrology: I. Formulation of the VIC-2L model
coupled to a routing model. Hydrological Sciences Journal 43,
131–141.
Madsen, H., 2000. Automatic calibration of a conceptual rainfall-
runoff model using multiple objectives. Journal of Hydrology
235, 276–288.
Madsen, H., 2003. Parameter estimation in distributed
hydrological catchment modelling using automatic calibration
with multiple objectives. Advances in Water Resources 26,
205–216.
Mortensen, A.P., Glass, R.J., Hollenbeck, K.J., Jensen, K.H., 2001.
Visualization of microscale phase displacement processes in
retention and outflow experiments: Non-uniqueness of unsatu-
rated flow properties. Water Resources Research 37(6),
1627–1640.
Rajaram, H., Georgakakos, K.P., 1989. Recursive parameter
estimation of hydrologic models. Water Resources Research
25(2), 281–294.
Reed, S., Koren, V., Smith, M.B., et al., 2004. Overall Distributed
Model Intercomparison Project results. Journal of Hydrology
298(1–4), 60–34.
Refsgaard, J.C., 1996. Terminology, modelling protocol and
classification of hydrological model codes. In: Abbott, M.B.,
Refsgaard, J.C. (Eds.), Distributed Hydrological Modelling,
Kluwer, Dordrecht, pp. 17–39.
Refsgaard, J.C., 1997. Validation and intercomparison of
different updating procedures for real-time forecasting. Nordic
Hydrology 28, 65–84.
Refsgaard, J.C., Butts, M.B., 1999. Determination of grid scale
parameters in catchment modelling by upscaling local scale
parameters (Invited Paper). In: Fejen, J., Wiyo, K. (Eds.),
Modelling of Transport Processes in Soils, International Work-
shop of EurAgEng’s Field of Interest in Soil and Water, 24–26
November, 1999, Leuven, Belgium, pp. 650–665.
Refsgaard, J.C., Knudsen, J., 1996. Operational validation and
intercomparison of different types of hydrological models.
Water Resources Research 32(7), 2189–2202.
Refsgaard, J.C., Storm, B., 1995. MIKE SHE. In: Singh, V.P., (Ed.),
Computer Models of Watershed Hydrology, Water Resources
Publications, Colorado, USA, pp. 809–846.
Refsgaard, J.C., Rosbjerg, D., Markussen, L.M., 1983. Application
of the Kalman Filter to Real Time Operation and to Uncertainty
Analyses in Hydrological Modelling, Scientific Procedures
Applied to the Planning, Management and Design of Water
Resources Systems (Proceedings of the Hamburg Symposium,
August, 1983), IAHS publ-no. 147., pp. 273–282.
Seo, D.-J., Koren, V., Cajina, N., 2003. : Real-time variational
assimilation of hydrologic and hydrometeorological data into
operational hydrologic forecasting. Journal of Hydrometeorol-
ogy 4, 627–641.
Singh, V.P. (Ed.), 1995. Computer Models of Watershed Hydrol-
ogy, Water Resources Publications, Colorado, USA, p. 1130.
Smith, M.B., Seo, D.-J., Koren, V.I., Reed, S., Zhang, Z., Duan,
Q.-Y., Moreda, F., Cong, S., 2004. The Distributed Model
Intercomparison Project (DMIP): motivation and experiment
design. Journal of Hydrology 298(1–4), 4–26.
Storm, B., Refsgaard, A., 1996. Distributed physically-based
modelling of the entire land phase of the hydrological cycle.
In: Abbott, M.B., Refsgaard, J.C. (Eds.), Distributed Hydro-
logical Modelling, Kluwer, Dordrecht.
World Meteorological Organisation (WMO), 1994, Fifth ed., Guide
to Hydrological Practices: Data Acquisition, and Processing,
Analysis, Forecasting and Other Applications, WMO Publ. No.
168.
World Meteorological Organisation (WMO), 2001. Report on the
Operational Use of EPS to Forecast Severe Weather and
Extreme Events, WMO Commission on Basic Systems, Meeting
of Expert Team on Ensemble Prediction Systems, Tokyo, Japan
15–19 October 2001, CBS ET/EPS/Doc. 3(7) (9.X.2001),
Geneva Switzerland, pp. 7.
M.B. Butts et al. / Journal of Hydrology 298 (2004) 242–266266