Uncertainty assessment of a process-based integrated catchment model of phosphorus

20
ORIGINAL PAPER Uncertainty assessment of a process-based integrated catchment model of phosphorus Sarah Dean Jim Freer Keith Beven Andrew J. Wade Dan Butterfield Published online: 7 October 2008 Ó Springer-Verlag 2008 Abstract Despite the many models developed for phos- phorus concentration prediction at differing spatial and temporal scales, there has been little effort to quantify uncertainty in their predictions. Model prediction uncer- tainty quantification is desirable, for informed decision- making in river-systems management. An uncertainty analysis of the process-based model, integrated catchment model of phosphorus (INCA-P), within the generalised likelihood uncertainty estimation (GLUE) framework is presented. The framework is applied to the Lugg catchment (1,077 km 2 ), a River Wye tributary, on the England–Wales border. Daily discharge and monthly phosphorus (total reactive and total), for a limited number of reaches, are used to initially assess uncertainty and sensitivity of 44 model parameters, identified as being most important for discharge and phosphorus predictions. This study demon- strates that parameter homogeneity assumptions (spatial heterogeneity is treated as land use type fractional areas) can achieve higher model fits, than a previous expertly calibrated parameter set. The model is capable of repro- ducing the hydrology, but a threshold Nash-Sutcliffe co-efficient of determination (E or R 2 ) of 0.3 is not achieved when simulating observed total phosphorus (TP) data in the upland reaches or total reactive phosphorus (TRP) in any reach. Despite this, the model reproduces the general dynamics of TP and TRP, in point source domi- nated lower reaches. This paper discusses why this application of INCA-P fails to find any parameter sets, which simultaneously describe all observed data accept- ably. The discussion focuses on uncertainty of readily available input data, and whether such process-based models should be used when there isn’t sufficient data to support the many parameters. Keywords INCA-P GLUE Uncertainty estimation Phosphorus models Diffuse agricultural pollution Water quality modelling 1 Introduction The water framework directive (WFD) legislation was introduced in December 2000 (EC 2000/60/EC) to improve the chemical and ecological status of European freshwater, transitional and coastal waters. This legislation is funda- mentally a new approach, as it considers the catchment as a whole. Every catchment designated as being at significant risk (e.g. from diffuse pollution, acidification) requires a river basin management plan (RBMP). The WFD super- sedes seven previous pieces of legislation and will address all aspects of water quality as well as factors within catchments that could negatively impact on water quality and associated ecology, the first time this important link has been enshrined in a legislative framework. The impli- cations for terrestrial and aquatic habitats are great, and necessitate the furthering of current knowledge of pressures and impacts. Consequently, there is a real need to develop S. Dean J. Freer K. Beven Lancaster Environment Centre, Lancaster University, Lancaster LA1 4YQ, UK A. J. Wade D. Butterfield Aquatic Environments Research Centre, Department of Geography, University of Reading, Reading RG6 6AB, UK Present Address: J. Freer (&) School of Geographical Sciences, University of Bristol, University Road, Bristol BS8 1SS, UK e-mail: [email protected] 123 Stoch Environ Res Risk Assess (2009) 23:991–1010 DOI 10.1007/s00477-008-0273-z

Transcript of Uncertainty assessment of a process-based integrated catchment model of phosphorus

ORIGINAL PAPER

Uncertainty assessment of a process-based integrated catchmentmodel of phosphorus

Sarah Dean Æ Jim Freer Æ Keith Beven ÆAndrew J. Wade Æ Dan Butterfield

Published online: 7 October 2008

� Springer-Verlag 2008

Abstract Despite the many models developed for phos-

phorus concentration prediction at differing spatial and

temporal scales, there has been little effort to quantify

uncertainty in their predictions. Model prediction uncer-

tainty quantification is desirable, for informed decision-

making in river-systems management. An uncertainty

analysis of the process-based model, integrated catchment

model of phosphorus (INCA-P), within the generalised

likelihood uncertainty estimation (GLUE) framework is

presented. The framework is applied to the Lugg catchment

(1,077 km2), a River Wye tributary, on the England–Wales

border. Daily discharge and monthly phosphorus (total

reactive and total), for a limited number of reaches, are

used to initially assess uncertainty and sensitivity of 44

model parameters, identified as being most important for

discharge and phosphorus predictions. This study demon-

strates that parameter homogeneity assumptions (spatial

heterogeneity is treated as land use type fractional areas)

can achieve higher model fits, than a previous expertly

calibrated parameter set. The model is capable of repro-

ducing the hydrology, but a threshold Nash-Sutcliffe

co-efficient of determination (E or R2) of 0.3 is not

achieved when simulating observed total phosphorus (TP)

data in the upland reaches or total reactive phosphorus

(TRP) in any reach. Despite this, the model reproduces the

general dynamics of TP and TRP, in point source domi-

nated lower reaches. This paper discusses why this

application of INCA-P fails to find any parameter sets,

which simultaneously describe all observed data accept-

ably. The discussion focuses on uncertainty of readily

available input data, and whether such process-based

models should be used when there isn’t sufficient data to

support the many parameters.

Keywords INCA-P � GLUE � Uncertainty estimation �Phosphorus models � Diffuse agricultural pollution �Water quality modelling

1 Introduction

The water framework directive (WFD) legislation was

introduced in December 2000 (EC 2000/60/EC) to improve

the chemical and ecological status of European freshwater,

transitional and coastal waters. This legislation is funda-

mentally a new approach, as it considers the catchment as a

whole. Every catchment designated as being at significant

risk (e.g. from diffuse pollution, acidification) requires a

river basin management plan (RBMP). The WFD super-

sedes seven previous pieces of legislation and will address

all aspects of water quality as well as factors within

catchments that could negatively impact on water quality

and associated ecology, the first time this important link

has been enshrined in a legislative framework. The impli-

cations for terrestrial and aquatic habitats are great, and

necessitate the furthering of current knowledge of pressures

and impacts. Consequently, there is a real need to develop

S. Dean � J. Freer � K. Beven

Lancaster Environment Centre, Lancaster University,

Lancaster LA1 4YQ, UK

A. J. Wade � D. Butterfield

Aquatic Environments Research Centre,

Department of Geography, University of Reading,

Reading RG6 6AB, UK

Present Address:J. Freer (&)

School of Geographical Sciences, University of Bristol,

University Road, Bristol BS8 1SS, UK

e-mail: [email protected]

123

Stoch Environ Res Risk Assess (2009) 23:991–1010

DOI 10.1007/s00477-008-0273-z

tools to predict surface and groundwater responses, and

their associated dependent habitats, to both anthropogenic

pressures and remediation projects. Liu et al. (2005) state

that water quality models are a necessity in such catchment

management because of their ability to apply current

knowledge to predict water quality in response to different

scenarios (e.g. such as the likely consequences of different

farming practices).

Dynamic, process-based models of pollutant sources and

catchment dynamics are necessarily complex because they

attempt to describe all factors and processes so that the

relative importance of these may be understood and

investigated in response to environmental change. Fully

distributed process-based models are the most complex

form of environmental model as they attempt to model

every process deemed important for every location in a

catchment, generally on a grid-basis. However, it is gen-

erally accepted that a compromise is required between

available data, process representation, and runtime speed,

when using the model within a Monte-Carlo based sensi-

tivity or uncertainty analysis, to develop a pragmatic

approach (Langan et al. 1997). The key sources and pro-

cesses controlling nutrient water quality characteristics are

well established, but the understanding of how the sources

and processes vary in time and space is still limited. This is

due to the heterogeneity of environmental factors which

define source-areas and control process rates and delivery

from the land to the stream network, such as land use, soil

type, moisture and temperature, and flow-routing. Often the

data available to develop and apply predictive models are

generally insufficient, even for small research catchments.

Thus, while it is useful to develop models based on process

understanding, they will always necessarily be simplifica-

tions of reality. These simplifying assumptions are a source

of uncertainty in a model, and the robustness of any model

application will be dependent upon the validity of the

assumptions made. It is important, therefore, that the

uncertainties in model predictions are well understood.

Therefore, estimating prediction uncertainties in water

quality modelling is becoming increasingly appreciated

(Krueger et al. 2007; Page et al. 2004, 2005; Radwan et al.

2004; Rode et al. 2007; Singh et al. 2007; van Griensven

and Meixner 2006).

Parameter values in process-based models are often

difficult to measure or estimate, especially at the scale

required by the model, i.e. the effective model unit scale

which is the scale at which processes are represented in the

model. This necessitates the use of effective parameter

values to compensate for the underlying variability in

processes, site characteristics and limits in the model’s

process representations (Beven 1996, 2002, 2006). In

general, there are no techniques available for measuring

effective parameter values; so they are estimated through

calibration. Process-based models often suffer from over-

parameterisation, where the model parameters cannot be

identified with certainty from the information content of

the available observed data. This often leads to poorly

constrained parameter values resulting in many different

parameter sets producing acceptable fits of the observed

data, termed equifinality (Beven 2006). This also means

that any ‘optimum’ parameter set will not be robust (i.e.

may change) against a different period of calibration data

or errors in observations. Attempts are being made to

overcome this, for example through the use of high-fre-

quency data (Arnscheidt et al. 2007; Jordan et al. 2005,

2007; Kirchner et al. 2004); soft-data (e.g. Rankinen et al.

2006), and the assessment of internal measurements so that

models are tested against observations made at points

within a catchment rather than just at the catchment outlet

(e.g. Freer et al. 2004; Gallart et al. 2007).

Sensitivity and uncertainty analyses provide model users

with information regarding the effect of model parameters

and input data on the resultant model prediction. Sensi-

tivity analysis is particularly concerned with identifying the

parameters that are most influential in the model simula-

tions, whereas an uncertainty analysis is used to estimate

the error in a model output given uncertainty in the model

structure, parameters and input data. The generalised

likelihood uncertainty estimation (GLUE) technique

(Beven and Binley 1992) is a framework for evaluating a

model, given an acceptance of the equifinality concept.

GLUE provides information on the uncertainties in a

model’s predictions. This information helps model-users to

understand the confidence they can have in a model’s

prediction. Beven (2006, 2008) provides a full background

on the GLUE methodology. GLUE has been applied to

many different environmental models, including sediment

and geochemical models (Beven and Binley 1992; Brazier

et al. 2000; Freer et al. 1996, 2003, 2004; Zak and Beven

1999b). Here, GLUE is applied to the integrated catchment

model of phosphorus (INCA-P), in an application to the

Lugg Catchment, UK.

The INCA-P is a physically based, highly parameterised

model, requiring a variety of input variables, some of

which are often poorly known and impossible to measure,

and so require calibration. To date, there has been no

published uncertainty analysis performed on INCA-P,

although the in-stream part of the model structure has been

subject to both sensitivity and uncertainty analysis (Wade

et al. 2001, 2002b, c). There have also been numerous

assessments of the ability of the integrated catchment

model of nitrogen (INCA-N) to predict nitrogen dynamics

at the catchment scale (Granlund et al. 2004; Limbrick

et al. 2000; McIntyre et al. 2005; Raat et al. 2004;

Rankinen et al. 2006). The model developers envisage that

INCA-P could be used as part of a model hierarchy in

992 Stoch Environ Res Risk Assess (2009) 23:991–1010

123

which steady-state models, e.g. Phosphorus Indicator Tool

(Heathwaite et al. 2003) or Export Coefficient Model

(Johnes 1996), would produce a summary of the national

export of phosphorus and INCA-P would provide a detailed

assessment of particular catchments which are of particular

concern or interest (Wade et al. 2004).

The objectives of this study are:

1. to assess the model uncertainty in the application of

the INCA-P model to the Lugg catchment, using the

GLUE uncertainty framework;

2. to investigate parameter uncertainty and how these

uncertainties impact upon the model predictions;

3. to consider whether, allowing for the lack and quality

of data available for calibration, such complex models

are suitable for assessing phosphorus dynamics at the

catchment scale; and

4. to consider the suitability of INCA-P as a potential

phosphorus modelling tool for implementing the WFD.

2 Materials and methods

2.1 The GLUE framework

A full account of the GLUE methodology and rationale can

be found in Beven and Binley (1992), Beven and Freer

(2001), and Beven (2006, 2008). What follows is a brief

description.

GLUE is an extension of the regional sensitivity analysis

(RSA) proposed by Spear and Hornberger (1980). For any

given set of observed data and particular choice of per-

formance measure, there will be an optimal model structure

and parameter set that best describes that dataset. However,

RSA and GLUE recognise that there may be many models

(structures and parameter sets) that give acceptable results,

and equally, given a different or additional set of calibra-

tion data, or a different performance measure, the optimal

model is likely to be different. As GLUE considers

parameter sets as opposed to the individual parameters,

interactions between parameters in providing a good (or

bad) fit is implicitly accounted for. GLUE differs from

RSA in the way it treats acceptable model simulations.

They both use a performance measure threshold to define

acceptable models but, whereas RSA treats all acceptable

models equally in looking at global sensitivities, GLUE

calculates a likelihood weight for each simulation by

evaluating the performance of the simulation in compari-

son with observed data and then uses those weights to

evaluate uncertainties in predicted model outputs over all

the simulations considered acceptable.

GLUE requires a number of prior decisions to be made,

which should always be reported:

1. which parameters to vary;

2. which model structure(s) to consider;

3. the ranges within which the parameter values should

be varied;

4. the likelihood weights to be used in assessing the

performance of a model simulation; and

5. a procedure for creating the uncertainty prediction

bounds.

Likelihood weights are calculated for all acceptable

model simulations. The choice of performance measure is

important; it should reflect our best understanding of the

possible errors in the observed data (see discussion in

Beven 2006). However, in certain cases, such as this study,

an understanding of the possible combined errors in the

modelling application (input and output observation error

combined with model structure error) are not known and,

therefore, the choice of measure is necessarily subjective

but must provide a relative measure of model performance.

The resultant likelihood weights are defined as 0 for

unacceptable model simulations and should increase in

value as model simulations improve. The performance

measures used in this study are:

R2 ¼ 1� SSE

SSTwhen R2 [ 0 ð1Þ

1=RMSE = 1

, ffiffiffiffiffiffiffiffiffiSSE

n

rð2Þ

where

SST ¼Xn

i¼1

wiðyi � �yÞ2 ð3Þ

SSE ¼Xn

i¼1

wiðyi � yiÞ2 ð4Þ

where i is the current time step, n is the number of data

points, SST is the sum of squares of the observations (yi)

around the observed mean ð�yÞ; SSE is the sum of squared

model errors with yi being the simulated value, RMSE is

the root mean squared error, and R2 is a coefficient of

determination; it is a comparison between the ability of the

model to simulate the observed data and using the mean of

the observed data as a predictor of the observed data. R2

usually ranges between 0 and 1, but when SSE is greater

than SST, the mean of the data would be a better predictor,

and a negative R2 value is obtained. 1/RMSE is also a SSE

based measure of fit, but always takes positive values. Both

these coefficients are biased towards fitting high values of

discharge and concentration because they are based on

squared errors. Wade et al. (2004) used R2 in a previous

assessment of INCA-P with which we compare our simu-

lations, and this measure has also been commonly used in

other uncertainty analyses (McIntyre et al. 2005; McIntyre

Stoch Environ Res Risk Assess (2009) 23:991–1010 993

123

and Wheater 2004; Rankinen et al. 2006). Therefore, R2

was the performance measure used to determine whether a

model simulation was acceptable. However, due to the total

phosphorus (TP) R2 results being almost all negative (see

later discussion), 1/RMSE was used to evaluate the simu-

lations in terms of sensitivity and prediction bounds as the

use of negative R2 values would not provide an appropri-

ately weighted performance measure. However, R2 is still

referred to when making references to Wade et al. (2004).

2.2 The INCA-P model

INCA-P was developed to investigate transport and reten-

tion of phosphorus in terrestrial and aquatic environments,

and to quantify the impacts of phosphorus loads on in-

stream macrophyte biomass. A full description of INCA-P,

including process equations used, appeared in Wade et al.

(2002a) and an initial application is reported in Wade et al.

(2007). Only those parts relevant to this study are described

herein. INCA-P is a dynamic, semi-distributed, process-

based model which predicts discharge and concentrations

of suspended sediment, soluble reactive phosphorus (SRP)

and TP concentrations in stream water by tracking dis-

charge and phosphorus through the soil and groundwater to

the main channel. The model also simulates the following

processes: bed sediment resuspension, suspended sediment

deposition, the growth effect of phosphorus on macro-

phytes and epiphytes, and the feedback of the growth on

phosphorus concentrations in the stream water.

Briefly, the INCA-P model consists of three

components:

• A land-phase hydrological model: this calculates dis-

charge through different pathways (direct, soil and

groundwater) and their stores. Discharge and phospho-

rus is controlled through this component.

• A land-phase phosphorus model: this component deals

with the various phosphorus stores in soil and ground-

water, and phosphorus transformations.

• An in-stream phosphorus model: this component sim-

ulates the phosphorus processes operating once it

reaches the stream—dilution and transformations, as

well as the concomitant algal, epiphyte and macrophyte

growth responses.

INCA-P is based on a simple mixing model approach,

whereby conceptually the water (and any phosphorus being

transported) is mixed from the different land uses (up to six

user-defined classes) within each reach and then routed

along the main stream. The in-stream model is based on the

Kennet model (Wade et al. 2001) which simulates in-

stream phosphorus and macrophyte/epiphyte dynamics

(Wade et al. 2002a). Phosphorus sources can include fer-

tiliser, plant residue, slurry, animal waste and wastewater.

Typically, when INCA-P is applied to a catchment, the

sensitive parameter values are calibrated by comparing

model output to observed discharge, suspended sediment

(SRP) and TP measured along the main channel. The user

guide (Butterfield et al. 2004) contains the calibration

guidelines. A parameter set derived by the model devel-

opers, using expert calibration (i.e. manual changing of

parameters) of INCA-P to the Lugg catchment, is used as a

benchmark for comparing the results obtained from this

GLUE analysis (Wade et al. 2004, 2007).

INCA-P has been used recently on behalf of the Envi-

ronment Agency (EA), UK, to look at the effectiveness of

eutrophication control through reductions in phosphorus

(Wade et al. 2004, 2007). It was applied to three catch-

ments in the UK, including the Lugg Catchment. The

report concluded that the representation of discharge, sus-

pended sediment and phosphorus ‘‘appeared reasonable’’

(Wade et al. 2004). It was noted that there was a clear need

for detailed point source data and a more general assess-

ment of the relative contributions of point and diffuse

sources, for which the current monitoring networks of the

EA are unfortunately not yet sufficient.

2.3 Evaluation catchment

A full description of the River Lugg catchment can be

found in (Wade et al. 2004, 2007). Data was supplied by

the model developers at the University of Reading as an

example of a catchment for which they felt INCA-P pro-

vided a good representation of observed discharge and

phosphorus dynamics. The River Lugg was chosen as a

study area because it is known to have very high

(approximately 1 mg l-1) stream water phosphorus con-

centrations (measured as TP) in its lower reaches. Such is

the concern about its water quality that, in 1994, the Lugg

was designated a ‘eutrophic sensitive’ area under the Urban

Wastewater Treatment Directive.

The River Lugg is a tributary of the Wye; its source is in

Powys. The catchment area of the entire Lugg is 1,077 km2,

and the catchment area to the lowest gauged point is

885 km2; the long-term annual mean rainfall (1961–1990)

is 977 mm in the north of the catchment and 877 mm in the

south. The headwaters drain Silurian rocks, and the

impermeable bedrock is covered by extensive deposits of

gravel in the valleys. The geology of the lower Lugg is

predominantly Devonian Old Red Sandstone. The upland

area consists of a mix of forestry, grazing and arable land,

whereas the lowland is dominated by arable cultivation

(http://www.environment-agency.gov.uk/hiflows). Many

point sources of pollution have been identified, including

sewage effluent, trade effluent and those from private

outlets (Wade et al. 2004, 2007). The most important con-

cerning downstream water quality are the Welsh Water

994 Stoch Environ Res Risk Assess (2009) 23:991–1010

123

sewage works at Leominster discharging into Reach 10 and

the trade effluent discharged from Cadbury’s chocolate

factory into Reach 12.

Discharge and water quality in the Lugg catchment are

monitored by the EA (Welsh office). The discharge is

recorded every 15 min; water quality is monitored by grab

sampling approximately monthly. This resolution of water

quality sampling is typical for the EA national monitoring

network. The catchment is divided into 22 sub-catchments

for the INCA-P application. There are 3 sub-catchments

with observed discharge data: Byton, Butts Bridge and

Lugwardine; 10 sub-catchments with suspended sediment

(not evaluated in this paper) and TRP data; and 8 sub-

catchments with TP observations (Fig. 1). The INCA-P

model provides predictions of SRP as opposed to TRP

(where the same analysis is performed on unfiltered sam-

ples). This discrepancy is discussed in Sects. 3 and 4.

2.3.1 Model set-up

The model set-up is given in detail elsewhere (Wade et al.

2004); what follows is a summary of input datasets, choice

of parameters to be varied and parameter ranges. INCA-P

hydrological input data requirements are: hydrologically

effective rainfall (HER), actual precipitation, air tempera-

ture, and soil moisture deficit (SMD). In the UK, output

from the meteorological office rainfall and evapotranspi-

ration calculation system (MORECS) is typically used to

derive the hydrologically effective rainfall and SMD based

on measurements of temperature, bright sunshine hours,

wind and humidity. Single site daily MORECS time series

data for hydrologically effective rainfall, SMD and air

temperature were licensed from the Meteorological Office.

The MORECS time series are based on data from two

weather stations, one representing the north of the catch-

ment (Llandrindod Wells) and the other the south (Madley)

(Table 1). HER is an expression of excess (or effective)

rainfall (Hough and Jones 1997); it is the water that pen-

etrates the soil after interception and evaporation have been

taken into account. The MORECS model also produces a

time series of SMD, which is used to control the rates of

phosphorus transformations in INCA-P.

Wade et al. (2004, 2007) applied the Llandrindod data to

the north of the catchment (Reaches 1–9) and the Madley

data to the south of the catchment (Reaches 10–22); the

implications of this are discussed in Sects. 3 and 4. The

average annual (1961–1990) rainfall recorded for the

three discharge stations at Byton (Reach 4), Butts Bridge

(Reach 8) and Lugwardine (Reach 19) are 977, 877 and

814 mm, respectively (http://www.environment-agency.

gov.uk/hiflows). Assuming that the rainfall for the six

year period considered in this study was well represented

by these averages, the estimated HER is approximately 66,

73 and 33% of rainfall at each gauging station, respec-

tively. This difference results from the MORECS method

of calculating HER. This is because, in part, the SMD

between the two weather station areas are very similar

during the winter months when both are close to zero, but

Fig. 1 The River Lugg and its tributaries: the location of the discharge gauges, and nutrient monitoring sites (Wade et al. 2004). NB: Suspended

solids and TRP are also measured at all of the TP sites

Stoch Environ Res Risk Assess (2009) 23:991–1010 995

123

quite different during the summer, with Madley experi-

encing a greater deficit.

INCA-P has a large number of parameters: 47 land use

dependent parameters, 20 reach-dependent parameters, and

25 independent parameters. If this catchment was treated

heterogeneously throughout the six land uses and 22

reaches, that would equate to 747 parameters requiring

identification. Some simplification in terms of the number

of parameters considered is therefore necessary. Irvine

et al. (2005) observed that, although complex physics-

based models can provide better causal relationships in

predicting the impacts of measures to implement the WFD,

it is generally not possible to obtain all the data required to

develop and apply them. McIntyre et al. (2005) performed

two investigations into the sensitivity of INCA-N. The first

investigation considered the catchment to be homogenous,

as done in the present study, and the second heterogeneous.

They concluded that the sensitivity analysis conducted

using a heterogeneous parameter set returned little infor-

mation not already uncovered through the homogenous

parameter set analysis, although they did find that, in

general, the optimum parameter values were reach-depen-

dent (heterogeneous). Using reach-dependent parameters

dramatically improved nitrogen simulation, but discharge

was unaffected and ammonium simulation was actually

worse.

In this study, a pragmatic strategy for homogeneous

parameterisation across the catchment is implemented to

reduce the computation time required to sample such a

large model space. The success with which the model

space is sampled is discussed in Sect. 3. The assumption is

made that each sub-catchment and each land use could be

described by the same set of INCA-P parameters, as there

is not enough information for each individual sub-catch-

ment regarding characteristics that may lead to differences

in hydrological or nutrient transport and transformation, to

justify treating them differently. Many of the parameters

which are kept constant are related to the initialisation of

the model. These are set at values supplied with the model

in order to be directly comparable with the results from

Wade et al. (2004, 2007) and any parameters which vary

between sub-catchments or river reaches are kept as such.

The input data and initial values for phosphorus in the soil,

groundwater and in-stream reflect the historic applications

of phosphorus in the catchment. Land use types are also

differentiated by inputs of phosphorus which varies

throughout the catchment. The calibration guide in the

INCA-P user manual is used as a basis to identify param-

eters to vary, and the recommended ranges are used as a

starting point for the sampling ranges used in this GLUE

analysis. Where a range is not given in the guide, a range is

calculated using an order of magnitude variation above and

below the default value. Additional parameters, which are

thought to be interesting in terms of phosphorus dynamics,

are also considered in this analysis, including maximum

SMD and animal waste input to land. Initial ranges are

adjusted after a trial set of Monte Carlo simulations, as a

few parameters perform best on the edge of their range

whilst others only produce acceptable results in a narrow

band. The base flow index (BFI) range is selected differ-

ently, rather than sampling the recommended range of 0–1,

since Wade et al. (2004, 2007) already suggest the values

lie between 0.63 and 0.7 for the sub-catchments of the

River Lugg, the range 0.6–0.8 is used. Table 2 lists the 44

parameters being varied together with the minimum and

maximum values (the expert calibration values are inclu-

ded for comparison); where there is more than one value,

this indicates where land uses or reaches were parameter-

ised heterogeneously in Wade et al. (2004, 2007). Of the 44

parameters varied, 11 parameters are expertly calibrated

heterogeneously for each reach or land use, of those, three

related to hydrology and eight to phosphorus. The

remaining parameters are kept at default values, as speci-

fied in the user manual. In the studies by Wade et al. (2004,

2007), BFI was the only parameter relating to groundwater

that was calibrated as reach-dependent. The impact of

treating this parameter homogenously is explored in more

detail in Sects. 3 and 4. Parameters not sampled kept their

Table 1 Summary statistics for the MORECS time series relating to weather stations at Llandrindod Wells and Madley, including mean for

study period (1995–2000)

Total hydrologically effective rainfall (mm/year) Mean soil moisture deficit (mm/year) Mean air temperature (�C)

Llandrindod Wells Madley Llandrindod Wells Madley Llandrindod Wells Madley

1995 503.1 316.1 49.1 45.6 9.7 10.5

1996 391.1 205.6 36.6 51.3 8.6 9.2

1997 395.4 244.9 28.6 21.1 9.8 10.4

1998 819.5 273.1 8.5 45.7 9.7 10.3

1999 842.6 266.5 17.3 25.6 9.8 10.6

2000 902.7 311.4 11.6 42.0 9.4 10.2

1995–2000 642.3 269.6 25.3 38.5 9.5 10.2

996 Stoch Environ Res Risk Assess (2009) 23:991–1010

123

Table 2 INCA-P parameter value ranges used to simulate the Lugg catchment (multiple values in the ‘expert-calibration’ column indicate that

there is more than one parameter to represent the six land uses or 22 reaches)

Parameter Expert calibration GLUE parameter range Unit

Initial direct runoff flow (applies to 6 land uses) 0.001 0 0.001 m3 s-1

Initial soil water flow (applies to 6 land uses) 0.005 0.001 0.01 m3 s-1

Initial groundwater flow (applies to 6 land uses) 0.008 0.003 0.015 m3 s-1

Initial soil water organic P (applies to 6 land uses) 0.1; 0.01 0 1 mg l-1

Initial soil water inorganic P (applies to 6 land uses) 0.1; 0.01 0 1 mg l-1

Initial groundwater organic P (applies to 6 land uses) 0.001 0 0.01 mg l-1

Initial groundwater inorganic P (applies to 6 land uses) 0.001 0 0.01 mg l-1

Initial direct runoff organic P (applies to 6 land uses) 0.025; 0.01 0 0.1 mg l-1

Initial direct runoff inorganic P (applies to 6 land uses) 0.025; 0.01 0 1 mg l-1

Initial firmly bound organic P (applies to 6 land uses) 0.01 0 0.01 mg l-1

Initial firmly bound inorganic P (applies to 6 land uses) 0.01 0 0.01 mg l-1

Initial soil water drainage volume (applies to 6 land uses) 5E?05 1E?05 1E?06 m3

Initial groundwater drainage volume (applies to 6 land uses) 1E?07 1E?06 1E?08 m3

Initial direct runoff drainage volume (applies to 6 land uses) 1,000 1,000 6E?07 m3

Organic P uptake (applies to 6 land uses) 0.08; 6; 0.18; 0 0 2 m day-1

Immobilisation rate (applies to 6 land uses) 0 0 0.1 m day-1

Mineralisation rate (applies to 6 lands uses) 0 0 0.25 m day-1

Firmly bound organic P input (applies to 6 land uses) 0.07; 0.01; 0.22; 0 0 5 m day-1

Firmly bound organic P output (applies to 6 land uses) 0; 0.1 0 5 m day-1

Inorganic P uptake (applies to 6 land uses) 0.08; 6;

0.18; 0

0 6 m day-1

Firmly bound inorganic P input (applies to 6 land uses) 0.07; 0.01; 0.22; 0 0 0.1 m day-1

Firmly bound inorganic P output (applies to 6 land uses) 0; 0.1 0 0.1 m day-1

Max. P uptake by plants (applies to 6 land uses) 100 0 105 kg P ha-1 year-1

Max. soil moisture deficit (applies to 6 land uses) 150 0 170 mm

Plant residue (applies to 6 land uses) 0 0 366 kg P ha-1 year-1

Animal waste (applies to 6 land uses) 0 0 366 kg P ha-1 year-1

Slurry (applies to 6 land uses) 0 0 162 kg P ha-1 year-1

Dirty water (applies to 6 land uses) 0 0 162 kg P ha-1 year-1

Inorganic P fertiliser (applies to 6 land uses) 0 5 80 kg P ha-1 year-1

Direct runoff time constant (applies to 6 land uses) 0.01 0.001 0.1 Day

Soil water time constant (applies to 6 land uses) 1 0.5 5 Day

Groundwater time constant (applies to 6 land uses) 70 10 200 Day

Max soil retention volume (applies to 6 land uses) 0.2 0.1 1 m

Initial flow 0.01 0 2 m3 s-1

Initial TP in the water column 0.01 0.01 0.05 mg l-1

Initial TP in the pore water 0.01 0.01 0.1 mg l-1

Sediment resuspension/settling rate 8 8 800 mm s m-3

P exchange (water column/pore water) 1,000 100 10,000 Day-1

Precipitation of P in water column 0 0 1 Day-1

Kd for bed sediment 0.6 0.0005 1 –

Kd for suspended sediment (applies to 22 reaches) 800 200 1E?06 dm3 kg-1

SUP proportion (applies to 22 reaches) 0.05 0.25 –

Base flow index (applies to 22 reaches) 0.65; 0.7; 0.63 0 1 –

Alphaa (applies to 22 reaches) 0 0 1 –

See (Wade et al. 2002a) for detailed parameter definitions. Other parameters required by the model that are kept fixed are given in Appendixa Alpha is the trigger for overland flow

Stoch Environ Res Risk Assess (2009) 23:991–1010 997

123

reach and land use dependent values that were previously

identified by expert calibration (see Appendix); these

parameters are generally related to suspended sediment and

macrophyte/epiphyte growth.

A total of 200,000 simulations are made using Monte

Carlo random uniform sampling of the parameter ranges

shown in Table 2. It is recognised that 200,000 simulations

provide a limited sampling density when considering a 44-

dimensional parameter set; however, the results suggest

that the shape of the responses for individual parameters

are adequately defined (Figs. 2, 3). For each simulation

run, the performance measures (Eqs. 1, 2) are calculated

for each reach length shown in Fig. 1 relating to an

observed discharge or phosphorus dataset. The threshold of

acceptability is set at R2 C 0.6 for discharge and R2 C 0.3

for SRP and TP (the use of different thresholds ensures that

for both sets of observations some behavioural simulations

could be critically evaluated to assess the quality and/or

deficiencies of the best model predictions sampled).

3 Results

The GLUE framework is not a parameter optimisation

tool, since it is based on the use of multiple acceptable

models in estimating the prediction uncertainties generated

from Monte Carlo simulations using different parameter

sets. However, the results from the parameter set which

gained the combined highest R2 score, summing the R2

values for discharge, TRP and TP (with no weightings on

each type of information), are compared with those pre-

sented in Wade et al. (2004) using their expert calibration

parameter set. This results in the ‘best’ set from the Monte

Carlo simulations improving on this expert calibration for:

a) all flow reaches (Reaches 4, 8, and 19, having an R2 of

0.58, 0.36 and 0.48 for the expert calibration and for the

‘best’ parameter set these being 0.68, 0.64, and 0.65 R2,

respectively), and b) all TP simulations with an R2 [ 0

except for Reach 21 (Reaches 11, 12, and 21,

expert = 0.31, 0.32 and 0.47 R2 and from the ‘best’

parameters 0.37, 0.48, and 0.40 R2). Although this ‘best’

parameter set generally performs better than the expertly

calibrated parameter set, this too is unable to produce any

simulations that improve on fitting a mean of the observed

data (R2 = 0) for those upland catchments with observed

data for TP (Reaches 1, 3, 6, 9–10). Both sets of these

simulations are unable to reproduce the TRP with a

R2 C 0 for all reaches (Reaches 1, 3, 6, 9–12, 16, 19, 21).

This suggests treating the Lugg catchment homoge-

neously, as described, does not seem to have negatively

impacted the performance of the model when compared

with the heterogeneously calibration parameter set (with

spatially heterogeneous parameters).

Considering each reach independently, the GLUE

analysis is unable to identify any acceptable parameter sets

for the upland reaches when evaluated against the observed

TP data, and no acceptable parameter sets are found for any

reach for the TRP using the stipulated R2 criteria. This

Fig. 2 Relationship between

behavioural parameter value (x-

axis) and the performance

measure (y-axis) calculated

using observed flow data

998 Stoch Environ Res Risk Assess (2009) 23:991–1010

123

suggests that, in this application, the process representa-

tions in INCA-P are better suited to the simulation of

reaches controlled by point inputs of phosphorus than

predicting the inputs from diffuse sources. In the River

Lugg, the mean TP concentration increases downstream

(see Fig. 7), suggesting additional phosphorus inputs from

sewage treatment works and trade effluents, especially

following the input from Leominster STW into Reach 10

(Wade et al. 2007).

Table 3 shows the number of acceptable parameter sets

remaining as each set of constraining data listed in column

one is used: when only the discharge at Reach 4 is con-

sidered, there are 119,371 acceptable runs; when discharge

at Reaches 4 and 8 are considered, there are 51,821

acceptable runs, and so forth. If all of the constraining data

were used (or indeed if any of the TRP data were used), no

acceptable simulations would remain, as shown in Table 3

(final row, second column). The third column shows how

many acceptable runs are achieved if only using one set of

constraining data at a time. Based on the stipulated R2

thresholds, INCA-P is incapable of producing acceptable

model simulations of the TP concentrations at all of the

reaches simultaneously using spatially lumped parameters.

However, if only the constraining data for discharge at

Reaches 4, 8 and 19 are used, the model is able to achieve a

R2 performance of at least 0.67. This result shows that

discharge can be represented adequately by assuming the

catchment to be uniform, although some improvement

might be gained by taking the difference in groundwater

inputs to the different reaches into account. However,

obtaining acceptable simulation results for TP and TRP

may well relate to the model structure and input data (see

discussion in later sections).

3.1 Sensitivity to individual parameters

Sensitivity is assessed by visual inspection of dotty plots,

which represent projections of the response surface onto

single parameter axes. While such plots do not show the

complex interactions between parameters in a high-

dimensional parameter-space, the vast majority of model

parameters show no visually detectable sensitivity across

the range evaluated. Clearly, this could be further evaluated

using global sensitivity methods (e.g. Campolongo et al.

Fig. 3 Relationship between

top 10% of parameter values

(behavioural and non-

behavioural) (x-axis) and the

performance measure (y-axis)

calculated using observed TP

data from Reach 21: a Firmly

Bound Organic Phosphorus

Input Rate; b Firmly Bound

Organic Phosphorus Output

Rate; c Firmly Bound Inorganic

Phosphorus Input Rate;

d Firmly Bound Inorganic

Phosphorus Output Rate; and

e Groundwater Time Constant.

Stars behavioural, dotsnon-behavioural

Table 3 The reduction in numbers of behavioural parameter sets for

different constraining data as each dataset is sequentially added

(second column) and for each reach considered independently (third

column)

Constraining dataa Cumulative acceptable

simulations

Acceptable simulations

per constraining dataset

Reach 4: flow 119,371 119,371

Reach 8: flow 51,821 96,332

Reach 19: flow 33,997 83,247

Reach 11: TP 0 1

Reach 12: TP 0 3

Reach 21: TP 0 41

a TP Simulations for River Reaches 1, 3, 6, 9, and 10 do not produce

any acceptable simulations and hence there are no cumulative

acceptable simulations

Stoch Environ Res Risk Assess (2009) 23:991–1010 999

123

2007; Jansen 1999), and this will be the focus of further

research. Figure 2 shows dotty plots only for parameters

showing sensitivity to the observed discharge data, and

Fig. 3 shows dotty plots for parameters showing sensitivity

to the TP observations. The performance measure used is

defined in Eq. 2.

3.1.1 Hydrology

The most sensitive hydrological parameters are initial

groundwater flow, soil water time constant, groundwater

time constant and BFI; this is consistent with the findings

of McIntyre et al. (2005) for INCA-N who, in addition,

identified initial soil water discharge as being sensitive

(BFI was not assessed in their study).

Some of the dotty plots shown reveal that the model

performance is still increasing at the edge of the sampled

range. The ranges sampled were initially identified in

Wade et al. (2004) or suggested in the INCA-P user guide

(Butterfield et al. 2004), although following the initial

sensitivity analysis some of these ranges are altered.

Unfortunately, in some cases, even these extended ranges

fail to show the complete pattern of parameter response.

However, it is beyond the project resources to extend the

ranges further and thus they will be assessed in more detail

in our future works.

Other than BFI, the response of Reach 8 is different to

the responses of the other two catchments (Fig. 2). The

groundwater time constant shows the biggest difference in

response by the three catchments. The difference seen at

Reach 8 may be caused by the driving meteorological data.

As previously explained, there are two meteorological

driving datasets derived from MORECS, one applied to the

upland catchments and one to the lowlands. Reach 8 is

approximately centrally located, and was assigned the input

hydrological data from the Llandrod Wells in the original

report, which was assumed representative of the upland

area of the catchment (Wade et al. 2004). The ratio

between observed discharge and HER should be approxi-

mately 1, the ratio between these data for the three

catchments are 1.09, 0.84 and 1.31 for Reaches 4, 8 and 19,

respectively. The ratios for Reaches 4 and 19 are both over

one, indicating that there is more discharge than HER in

the catchment, whereas Reach 8 shows the opposite.

Therefore, the similar responses of Reaches 4 and 19 and

the significant difference in Reach 8 are primarily con-

trolled by these HER differences. Reach 8 is routing excess

rainfall through the groundwater storage in order to pro-

duce the correct observed discharge (Fig. 2). We suggest

that heterogeneous parameters are therefore required,

although the possibility of having inadequate driving data

cannot be discounted. If the lowland rainfall data were

applied to Reaches 7–22, the ratio at Reach 8 would be

0.96. These findings illustrate the fundamental importance

of an accurate estimation of inputs to the system in

obtaining behavioural predictions (even if the perfect

model was available, see Beven (2006)).

3.1.2 Total reactive and total phosphorus

To assess the response of the model and identify why it

might be failing, the top 10% of results are plotted; these

include acceptable and unacceptable model outputs (stars-

acceptable, dots-unacceptable). As all the reaches show

similar responses (unlike the discharge), Fig. 3 shows only

the parameters at Reach 21 which appear visually sensitive.

The fact that all of the reaches produce similar responses

further confirms the acceptability of the approach of

treating phosphorus related parameters homogenously.

It is worth highlighting here that the model fits are poor

for total and SRP simulations, and that acceptable simu-

lations (R2 C 0.3) are only obtained for Reaches 11, 12 and

21 for TP (see Table 3).

The groundwater time constant shows parameter sensi-

tivity to both observed discharge and TP data (Fig. 3e). The

sensitivity of the groundwater time constant is indicative of

the reliance of TP concentration on the accuracy of the

hydrology component. This is likely due to the parameter

controlling the amount of water available to dilute diffuse

and point source inputs to stream. Any water quality model

can only be as good as the water quantity model driving it.

The firmly bound phosphorus output rates relate to the

rate at which phosphorus is released into the available soil

phosphorus store and the input rates relate to the rate at

which phosphorus is locked away into the firmly bound

phosphorus store. Figure 3 shows that the best simulations

occur when the output rate is approaching zero and the

input rate is approaching the maximum of the suggested

range. Consequently, over time, there is a net increase in

the amount of firmly bound phosphorus locked away in the

soil (being made unavailable). However, the amount of

phosphorus being added from fertiliser and manure on an

annual basis remains constant within each catchment. The

mass balance calculated within INCA-P at the end of the

simulation confirms that the amount of firmly bound

phosphorus stores in the soil has increased for all land uses

in all reaches. The phenomenon of catchments acting as a

sink for phosphorus has been observed in other UK

catchments (Neal et al. 2004b).

3.2 Calculation of indicative prediction bounds

The acceptable parameter sets from the GLUE analysis

(with R2 [ 0.6 for hydrology and R2 [ 0.3 for TP and

SRP) are used to produce 5% and 95% prediction bounds.

The prediction bounds are calculated from the outputs of

1000 Stoch Environ Res Risk Assess (2009) 23:991–1010

123

all the behavioural models and for each reach length,

weighted by a ‘likelihood’ value that is conditional on the

choices identified for the GLUE analysis for each param-

eter set. Values below the acceptability threshold are

assigned a likelihood of zero and values above the

threshold are assigned a likelihood proportional to 1/

RMSE, rescaled such that they sum to unity. These

rescaled weights are then applied to the model outputs at

each timestep to create uncertainty bounds representing the

5th and 95th quantiles of predicted values. These quantiles

are then plotted as prediction bounds with the observed

data. Only the water year of 1995/1996 is presented.

3.2.1 Hydrology

Figure 4 shows the observed and predicted bounds for

discharge for the water year 1995/1996 for Reaches 4, 8

and 19, respectively. It is worth noting that the GLUE

prediction limits are not confidence limits in a statistical

sense, but are rather derived from the likelihood weighted

distribution of outputs from all the behavioural models.

This means that deficiencies in the model performance are

not compensated by a statistical model of the errors; thus,

the observed dynamics not reproduced can be identified

(see Beven 2006 for a more complete discussion). Fig-

ure 4a shows that in Reach 4 the model under-predicts

discharge during the wet season, but seems to predict well

the discharge during the drier months of June to October.

Reach 8 (Fig. 4b) similarly shows a good prediction of

discharge in the dry season, but an over-prediction of peak

discharges and an under-prediction of the falling limb in

the wet season. Reach 19 (Fig. 4c) shows that the contri-

bution of groundwater during the drier months is over-

estimated, but the representation of the rising and falling

limbs in the wet season are visually good. The difference in

the ability of the model to accurately predict the observed

discharge during the drier months may be a result of using

the same BFI for all reaches. According to Wade et al.

(2004), the expertly calibrated BFI values for the three

reaches are 0.65, 0.7 and 0.63, respectively. However, the

EA hiflows database (http://www.environment-agency.gov.

uk/hiflows) gives the BFIHOST values [BFI derived using

the Hydrology of Soil Types (HOST) classification] as

0.593, 0.610 and 0.587, respectively. Although different,

they both agree in the ranking of the reaches, as do

the ‘peak’ of the distributions derived from the GLUE

simulations: 0.608, 0.609 and 0.603. It is suggested that

the use of a single BFI value has led to an over-prediction

in groundwater contribution during the drier months of

the year in Reach 19, based on the assumption that the

optimal homogeneous BFI value is higher than the true

BFI value of Reach 19, and lower than the true value at

Reach 8.

The difference in the ability of the model to accurately

predict the observed discharge during storm peaks may be

related to how the areas relating to each weather station are

divided. Reach 8 is very near the boundary between the

two respective areas, and the difference in HER between

the two areas is high. It is highly likely that the true HER at

Reach 8 is less than what is being used to drive the model

and hence the model over-predicts storm events in this

reach, whilst at Reach 4 the opposite appears to be true and

the model is under-predicting both storm peaks and

recession curves. The model at Reach 19 visually fits this

part of the observed hydrograph well. The difference in the

ability of the model to predict accurately the observed

discharge data during the falling limb of a storm event is

again related to using the same value for the BFI.

3.2.2 Phosphorus

Unlike the hydrology, there are very few model simulations

that reproduce the TP data acceptably and none that

reproduces the TRP data. As mentioned, the data being

used to assess the SRP model output is actually TRP. Wade

et al. (2007) demonstrate that INCA-P is capable of

reproducing the basic dynamics of the observed TRP, if not

the absolute values. However, even after the model output

has been adjusted (as described in Sect. 4.2) using a ratio

formulated between TRP and SRP, this study is still unable

to find any simulations which meet the fairly relaxed

threshold of R2 C 0.3.

All of the reaches show a similar response to changes in

parameter values in predicting TP, and hence only pre-

diction bounds for Reach 21 with the largest number of

behavioural parameter sets (41, see Table 3) are shown to

assess the model responses (Figs. 5, 6). In all reaches, there

are observations that are not well reproduced by the

behavioural models.

Figure 5 does clearly show, however, the width of the

prediction bounds increasing during the autumn, indicating

the increase in uncertainty of TP concentration during this

‘flushing’ period. It also shows how the model underesti-

mates the concentration of phosphorus during the early

autumn flush (August and September), but is well within

the wide prediction bounds at the peak of the autumn flush

in October. Figure 7 shows the temporal and spatial vari-

ation of the observed TP for multiple reaches. A pattern of

an autumn flush can be clearly seen in most of the reaches.

The pattern is less clear in Reaches 1 and 6, where con-

centrations are lowest. There is a small sewage treatment

works entering into Reach 3 which has a significant impact

on the TP concentration compared to the neighbouring

reaches with observed data. The effect of phosphorus

stripping commencing in the sewage treatment works in the

latter half of the time series is also very apparent.

Stoch Environ Res Risk Assess (2009) 23:991–1010 1001

123

Fig. 4 1995/1996 observed flow with 5 and 95% GLUE prediction bounds for a Reach 4; b Reach 8; and c Reach 19

1002 Stoch Environ Res Risk Assess (2009) 23:991–1010

123

4 Discussion

The results show that under the assumptions detailed, all

the simulations run using INCA-P can be rejected as

unacceptable representations of the observed data when

evaluated over all the available observed data sets. This is

not the first time that a GLUE study has rejected all models

(Freer et al. 2003; Zak and Beven 1999a). Assessing all

models as unacceptable then requires a consideration of

why this is the case in this application.

4.1 Ability of INCA-P to produce behavioural

simulations

The success of a model in producing an acceptable simu-

lation depends on four aspects: an adequate model

structure; adequate input data to drive the model; adequate

observations with which to evaluate model performance,

and the definition of ‘acceptable’ behaviour. The difficulty

in finding acceptable model runs for TP and TRP in this

application may be a result of all these causes. The data do

Fig. 5 Observed TP

concentrations with 5 and 95%

GLUE prediction bounds for all

41 behavioural models in Reach

21, for water year 1995/1996

Fig. 6 Observed TP concentrations with 5 and 95% GLUE prediction

bounds for all 41 behavioural models in Reach 21, for the entire study

period. For comparison, the expert calibration derived simulation (not

shown) is generally contained within the prediction bounds; it tends to

be slightly under the lower bound during periods of low concentration

Stoch Environ Res Risk Assess (2009) 23:991–1010 1003

123

not allow us to speculate about whether some process

representations in INCA-P could be improved (although

see the next section), since the inputs with which the model

is being driven are so uncertain.

The model successfully produces the expected differ-

ence in response between a reach dominated by point

sources of phosphorus and a reach dominated by diffuse

sources (Wade et al. (2004, 2007)). Figure 8 shows the

observed and simulated data for two differing reaches. The

simulated data points are those produced from the best

(highest total R2 across all performance measures) model

run from the Monte Carlo realisations. Figure 8a shows a

weak positive relationship (between TP (Reach 1) and

discharge (Reach 4). Such a weak positive response is

indicative of an agricultural catchment, where rainfall

instigates the mobilisation and delivery of phosphorus

associated with soil particles to the channel (Jarvie et al.

2006; May et al. 2001; Wade et al. 2004). Jarvie et al.

Fig. 7 Available observed total

phosphorus data for the study

period for different river reaches

to show the significant

variability in observed TP

concentrations

Fig. 8 Differences in the

flow—TP concentration

between a diffuse (a) and a

point (b) dominated reach: aReach 4 flow plotted against

reach 1 TP concentration; bReach 19 flow plotted against

Reach 21 TP concentration.

Crosses best Monte Carlo

simulation, stars observed

1004 Stoch Environ Res Risk Assess (2009) 23:991–1010

123

(2006) identified the same response in the Chitterne

catchment, but attributed the relationship to the interactions

with sewage soakaways rather than overland diffuse pol-

lution. The weakness of the signal is caused by the

complex interactions of factors that directly affect phos-

phorus mobilisation and transport from land to water.

These factors include antecedent conditions, soil P status,

farm management styles (fertiliser usage and livestock

management) and rainfall (quantity and intensity). Fig-

ure 8b shows a weak negative relationship between TP

concentration (Reach 21) and discharge (Reach 19). The

response shows that TP concentrations decrease with dis-

charge (dilution), which is indicative of point sources of

phosphorus (Neal et al. 2004a). The few higher points at

high discharges are from the beginning of the time series

before phosphorus stripping was introduced at Leominster

STW. The highest point was taken during the first event

following the first dry spell of the time series. This appli-

cation of INCA-P only finds acceptable model runs for

reaches dominated by point sources. The signal from a

point source is likely to be much higher than from a diffuse

source and so easier to identify against background varia-

tion and measurement error. Wade et al. (2004) state that

the poor performance in Reach 1 is due to the inability of

INCA-P to predict responses where source areas and flow

pathways are highly heterogeneous, and also note that work

is required to improve the TRP simulations, particularly

where concentrations are low and the result of diffuse

pollution.

Diffuse pollution from agriculture can be very difficult

to predict because of the many factors driving it, and the

relatively low concentrations typically measured are sub-

ject to higher percentage error and are closer to detection

limits of analytical techniques. Diffuse sources are also

intermittent and transient, dependent on factors for which

there is little or no data, e.g. the sporadic cleaning out of a

cow yard or variable phosphorus transport pathways

(Beven et al. 2005). These kinds of incidences, if they

coincide with the taking of sparse observations of stream

concentrations (typical of a national monitoring network),

can dominate a weak signal, but are extremely difficult to

account for.

4.2 Failure of INCA-P to produce any acceptable

TRP runs

In this study, INCA-P does not reproduce acceptably the

observations of TRP as defined by the R2 threshold of

explaining more than 30% of the observed variance. A

visual comparison of the simulated and observed stream

water TRP dynamics (Fig. 6) does show that, in the lower

reaches dominated by point sources, the general patterns of

dilution during winter and concentration during summer

and a general downward trend in concentration due to

phosphorus removal from Leominster STW and Cadburys

are reproduced. Because no behavioural simulations are

identified in this study, no further analysis is done to look

at prediction bounds or sensitivities, but it is interesting to

consider here why no behavioural simulations are found.

SRP concentrations in INCA-P are calculated from the TP

concentration. INCA-P uses a linear relationship to relate

SRP to TP, whereas the observed relationship is more

complex.

INCA-P simulates SRP but the available phosphorus

fraction from the EA measurements is TRP. Using a ratio

of 1.32 derived from Haygarth et al. (1997) to estimate the

TRP from the model output of SRP data, it is possible to

say that both the expert calibration and the ‘best’ MC run

are still considered unacceptable, although the R2 scores

(not shown) are marginally higher (see Fig. 9 for the

adjusted relationship). Figure 9 further shows that although

INCA-P predicts the main trend of the relationship between

TRP and TP, it is the outliers in the observed data in this

relationship that are the primary reason for the failure of

Fig. 9 Relationship between total reactive phosphorus (TRP) con-

centration and total phosphorus (TP) concentration for all reaches.

Crosses simulated (using expert calibration), stars observed. The plotshows that the observed TRP to TP relationship is variable, but the

model which initially predicts SRP calculates a linear relationship to

TP. In this case, the model output has been adjusted to better reflect

TRP using a ratio derived from Haygarth et al. (1997)

Stoch Environ Res Risk Assess (2009) 23:991–1010 1005

123

the model to produce any acceptable simulations in the

GLUE simulations.

In developing INCA-P, data from the River Lambourn

shows a strong linear relationship between soluble unre-

active phosphorus and TP (Wade et al. 2002c). This

implies that soluble unreactive phosphorus can, therefore,

be represented as a constant fraction of TP. Wade et al.

(2002a, b, c) acknowledge that this is a simplification, and

have since added an additional parameter to allow the

proportion of TP existing as soluble unreactive phosphorus

to be varied by reach (which would allow the general trend

of the observations, but not the outliers, in Fig. 9 to be

more closely matched). However, they also state that the

soluble unreactive phosphorus proportion is not only spa-

tially variable but should also vary with discharge (Wade

et al. 2002a), so that the use of a constant proportion is still

a simplification. This raises an interesting problem in

model evaluation. If it is known a priori that the outliers in

Fig. 9 (or any other predicted variable) cannot be predicted

by the model, then should they be used in model evalua-

tion, should they be treated in terms of the ‘‘effective

observation error’’ discussed by Beven (2006), or should

some other explicit error model be used for the observa-

tions? These approaches remain to be tested in future

works.

4.3 Data limitations

The measured stream water phosphorus concentrations are

subject to high levels of uncertainty due to a number of

factors, as detailed in Jarvie et al. (2002): sampling

method, storage, and analytical techniques can all con-

tribute to an uncertain result. Some of the analytical

techniques used to determine fractions of phosphorus can

become unreliable when certain other compounds are also

present in the water sample. They also comment on how

much EA data, referred to as ‘orthophosphate as P’, is in

fact molybdate reactive phosphorus (or TRP) as the anal-

ysis is performed on unfiltered samples, and so will contain

some particulate phosphorus which is reactive to the

reagents.

There is also an issue of incommensurability between

what is modelled and the available observed phosphorus

data. Samples for phosphorus analysis are collected by the

EA as grab samples which are only truly representative of

that single point in time and space, whereas in common

with most models working on a daily timestep, this data is

assumed representative of a daily mean value. However,

there are many studies showing how phosphorus concen-

trations can vary with discharge at sub-daily timesteps

(Cooke and Prepas 1998; Johnes 2007; Kronvang and

Bruhn 1996; May et al. 2001; Pionke et al. 1996; Svendsen

et al. 1995); hence, the assumption that a grab sample is

representative of a daily mean could have strong implica-

tions for the ability of the model to represent the observed

data. As well as the issue of incommensurability of the

observed data, there is also an issue with the amount of

data available and its information content. The motivation

behind the sampling regime of the EA is driven by consent

verifications and EC Directives; they sample fairly infre-

quently at most locations (approximately monthly) and are

not concerned with obtaining samples representative of the

full discharge range. Hence, the data being used to calibrate

the parameter values has relatively low information content

regarding the full dynamics of phosphorus transport.

Despite this, a user should still expect the model to

reproduce the majority of available data, if the measure-

ment and commensurability issues are allowed for.

In this application of INCA-P, the hydrological driving

input data (HER, SMD and temperature) used are derived

from two single-site MORECS calculations. As mentioned

previously, there is a large difference in HER between the

two sites which is not seen in the average rainfall or dis-

charge observed in the Lugg catchment. The decision to

divide the catchment into upper and lower sections at

Reach 9 is inappropriate, and if the division occurred at

Reach 7 better fits may be found. It is also suggested that

the use of MORECS derived HER data as opposed to

rainfall data may be inappropriate, and that a better fit

might be obtained if more local HER data, i.e. from within

the catchment, is available and used.

The phosphorus-related driving data are derived from a

number of sources (Wade et al. 2004, 2007). MORECS

uses an estimate of the growing season for each land use to

calculate evapotranspiration. These estimates are used

within INCA-P. The land use and livestock numbers are

based on a 1 km2 grid provided by ADAS based on the

1995 agricultural census figures, which have been re-

mapped by ADAS to allow for undisclosed data. It is

assumed that the figures from 1995 would be representative

of the study period. Fertiliser practice has been generalised

to a regional level, the monthly load estimates were based

on a farm survey conducted in the River Ant catchment

(Johnes and Butterfield 2003). A 70:30 split between

organic and inorganic phosphorus is also assumed. As

noted in Wade et al. (2007), due to the Non-Disclosure Act

(whereby a single holding cannot be identified), the actual

amount of phosphorus input may be lower than reported.

This may account for the large amount of phosphorus

accumulating within the firmly bound store. The data

available for the sewage work and trade effluent discharges

in this application is limited. For the smaller sites, an

average TP concentration and discharge is applied across

the entire study period, while the larger sites (Leominster

STW and Cadburys Factory) had monthly averaged values.

It was also decided in the original studies (Wade et al.

1006 Stoch Environ Res Risk Assess (2009) 23:991–1010

123

2004, 2007) that only those effluents being discharged

directly into the main channel would be included. Because

point sources on tributaries cannot be included in a single

model application. Wade et al. (2004, 2007) recognised

that this may be an over-simplification and further inves-

tigations to test the validity of this assumption are needed.

5 Conclusions

Despite the growing popularity of the INCA-P model, and

the suggested incorporation of it into the hierarchy of

models for use by the EA, there had been no peer-reviewed

consideration of the uncertainty in its predictions. This

paper has presented an uncertainty analysis of the INCA-P

model within the GLUE framework in an application to the

River Lugg (1,077 km2). It has improved understanding of

the performance of the INCA-P model through the con-

sideration of 200,000 Monte Carlo realisations of

parameter sets, and comparison of the model performance

using the best homogeneous parameter Monte Carlo run

and the expert calibrated heterogeneous parameter set.

Similar to the findings of McIntyre et al. (2005), in an

application of the nitrate version (INCA-N), it was found

that the most hydrologically sensitive parameters in the

Lugg application were groundwater-related. This was

expected due to the permeable nature of the catchment. A

large number of acceptable models of the hydrology were

found, although the prediction bounds over all behavioural

models showed some consistent departures from measured

discharges in the different reaches. The TP simulations

were sensitive to the groundwater parameters but also to

the parameters controlling the firmly bound phosphorus

stores. Although none of the models tried produced

acceptable models to all of the TP and TRP concentration

observations as defined by a R2 threshold of 0.3, it was

found that INCA-P was capable of reproducing the basic

difference between point and diffuse sources of pollution

and the seasonal and inter-annual patterns in the TP and

TRP concentrations in the lower reaches of the Lugg

system.

The failure of the model demonstrated here is in part a

result of limitations in the model structure, but also a result

of inadequate input data and measurement and commen-

surability errors in the observations with which the model

outputs were compared. Following a workshop held as part

of the Euro-limpacs project, INCA-P has since been

updated to better account for terrestrial sediment transport,

soil type, slope and particulate and soluble fractions of

phosphorus. Irrespective of changes to the model structure,

the application of any model depends on having adequate

input data in time and space, here in describing the vari-

ability in the hydrological conditions and flow pathways in

the different sub-catchments, in the spatial heterogeneity of

land use, and in the variability in inputs of phosphorus. In

addition, the data and knowledge are not available to

enable good prior estimates to be made of effective values

of so many parameters at the sub-catchment scales required

in such a model. Until these issues are confronted, the

performance of the INCA-P model and confidence in its

predictions will necessarily be limited.

5.1 Wider comment

The implementation of the WFD is driving a need for

models that can predict the impact of spatially distributed

phosphorus sources, changing flow-pathways and the likely

response to changes in climate, land-management and point

source inputs. INCA-P has the capability to make these

types of dynamic predictions, but this work has shown the

limited accuracy of the model predictions due to uncertainty

in the input data and in the representation of complex pro-

cesses by a simple model structure. As technologies for

remote sensing and high-frequency in-situ monitoring of

soil- and stream-waters become more reliable, widespread

and cheaper, there is then hope that new data will become

available that can be used to improve process representa-

tions and better constrain the uncertainty in predicting

phosphorus concentrations. It is generally the case, how-

ever, that improving process representations increases the

number of parameters that need to be estimated to run the

model. Thus, there will be an inevitable compromise

between improvements and uncertainty in applications to

those majority of sites that will not be data rich (Beven 2000,

2002). Krueger et al. (2007) have suggested that problems of

this type would require new modelling strategies within an

uncertainty learning framework for improving phosphorus

model development and evaluation. Importantly, we need to

further investigate what is the correct balance between the

process complexity of the models we develop with the

quality and quantity of observations we will ever have to

drive and evaluate models used to inform WFD policy

decisions at national scales of interest.

There is still much research to be done to build confi-

dence in this type of modelling. Ideally, model output

should be compared with long-term ([30 years) data sets

to identify if the model has been able to capture major

perturbations in a river system; this would provide some

confidence that a model was responsive to environmental

change. Such models can also be useful as ‘learning’ tools

for exploring the possible responses of a catchment to

environmental change and investigating the key factors and

processes operating. Such analysis may help to spark new

thoughts for experiments and monitoring with which to

assess the impacts of environmental change. The present

results suggest, however, that such studies need to estimate

Stoch Environ Res Risk Assess (2009) 23:991–1010 1007

123

the uncertainties involved in making such predictions and

explicitly consider the magnitude of such uncertainties in

drawing conclusions that might be later used in decision

making.

Acknowledgments This research was supported by the NERC

Grant Number NER/L/S/2001/00658. The contribution of AJ Wade

and D Butterfield was supported by the Environment Agency Science

Project P2-137

Appendix

See Table 4.

Table 4 Constant parameter values not adjusted as part of the Monte Carlo simulations (multiple values in the ‘expert calibration’ column

indicate that there is not just one parameter to represent all six land uses or all 22 reaches)

Parameter Expert calibration Units

Max. temperature difference (applies to 6 land uses) 0; 4.5 �C

Fertilisation addition start day (applies to 6 land uses) 0 Day

Fertiliser addition period (applies to 6 land uses) 0 Day

Plant growth start day (applies to 6 land uses) 1; 70; 77 Day

Plant growth period (applies to 6 land uses) 365; 160; 221 Day

Diffuse direct runoff suspended sediment input (applies to 6 land uses) 150 mg l-1

Diffuse boron input (applies to 6 land uses) 0 mg l-1

Direct runoff sustainable flow (applies to 6 land uses) 9,999 m3 s-1

Soil water sustainable flow (applies to 6 land uses) 9,999 m3 s-1

Groundwater sustainable flow (applies to 6 land uses) 9,999 m3 s-1

Mineralisation temperature threshold (applies to 6 land uses) 9,999 �C

Immobilisation temperature threshold (applies to 6 land uses) 9,999 �C

Diffuse soil water suspended sediment input (applies to 6 land uses) 80 mg l-1

Diffuse groundwater suspended sediment input (applies to 6 land uses) 10 mg l-1

Initial suspended sediment 5 mg l-1

Initial boron 0 mg l-1

Initial macrophyte mass 1 g C m-2

Initial epiphyte mass 0.01 g C m-2

Sediment grain size 100 lm

Sediment settled or resuspended 0.1 kg

Initial live phytoplankton 2.12 lg CHl ‘a’ l-1

Initial dead phytoplankton 1 lg CHl ‘a’ l-1

Proportion of P in epiphytes 0.0054 g P (g C)-1

Half-saturation of P for epiphyte growth 0.02 mg P l-1

Half-saturation of P for macrophyte growth 0.02 mg P l-1

Macrophyte self-shading constant 10 g C m-2

Proportion of P in macrophytes 0.0054 g P (g C)-1

Phytoplankton death rate 0 Day-1

Phytoplankton growth rate 0 Day-1

Half-saturation constant for phytoplankton growth (mg P l-1) 1 mg P l-1

Phytoplankton self-shading constant 1 g C m-2

Dead phytoplankton settling rate 0 Day-1

Flow a parameter (applies to 22 reaches) 0.04 n/a

Flow b parameter (applies to 22 reaches) 0.67 n/a

Macrophyte temperature dependency (applies to 22 reaches) 1.066 n/a

Epiphyte temperature dependency (applies to 22 reaches) 1.066 n/a

Phytoplankton temperature dependency (applies to 22 reaches) 1.066 n/a

Bed suspension potential (applies to 22 reaches) 0.1 n/a

Bulk sediment density (applies to 22 reaches) 2.65 kg m-3

1008 Stoch Environ Res Risk Assess (2009) 23:991–1010

123

References

Arnscheidt J, Jordan P, Li S, McCormick S, McFaul R, McGrogan

HJ, Neal M, Sims JT (2007) Defining the sources of low-flow

phosphorus transfers in complex catchments. Sci Total Environ

382(1):1–13

Beven KJ (1996) A discussion of distributed hydrological modelling.

In: M.B.R. Abbott JC (ed) Distributed hydrological modelling.

Kluwer, The Netherlands, pp 255–278

Beven KJ (2000) Uniqueness of place and process representations in

hydrological modelling. Hydrol Earth Syst Sci 4(2):203–213

Beven KJ (2002) Towards an alternative blueprint for a physically

based digitally simulated hydrologic response modelling system.

Hydrological Process 16(2):189–206

Beven KJ (2006) A manifesto for the equifinality thesis. J Hydrol

320(1–2):18–36

Beven KJ (2008) Environmental modelling: an uncertain future?

Routledge, London

Beven KJ, Binley A (1992) The future of distributed models—model

calibration and uncertainty prediction. Hydrological Process

6(3):279–298

Beven KJ, Freer J (2001) Equifinality, data assimilation, and

uncertainty estimation in mechanistic modelling of complex

environmental systems using the GLUE methodology. J Hydrol

249(1–4):11–29

Beven K, Heathwaite AL, Haygarth P, Walling DE, Brazier RE,

Withers PJA (2005) On the concept of delivery and nutrients to

stream channels. Hydrological Process 19(2):551–556

Brazier RE, Beven KJ, Freer J, Rowan JS (2000) Equifinality and

uncertainty in physically based soil erosion models: application

of the glue methodology to WEPP-the water erosion prediction

project-for sites in the UK and USA. Earth Surf Process

Landforms 25(8):825–845

Butterfield D, Wade AJ, Whitehead PG (2004) INCA-P v1.5 User

Guide, Aquatic Environments Research Centre, Reading

Campolongo F, Cariboni J, Saltelli A (2007) An effective screening

design for sensitivity analysis of large models. Environ Model

Softw 22(10):1509–1518

Cooke SE, Prepas EE (1998) Stream phosphorus and nitrogen export

from agricultural and forested watersheds on the Boreal Plain.

Can J Fish Aquat Sci 55(10):2292–2299

Freer J, Beven K, Ambroise B (1996) Bayesian estimation of

uncertainty in runoff prediction and the value of data: an

application of the GLUE approach. Water Resour Res

32(7):2161–2173

Freer JE, Beven KJ, Peters NE (2003) Multivariate seasonal period model

rejection within the generalised likelihood uncertainty estimation

procedure. In: Duan Q, Gupta H, Sorooshian S, Rousseau AN,

Turcotte R (eds) Calibration of watershed models. AGU, Water

Science and Application Series, Washington, pp 69–88

Freer JE, McMillan H, McDonnell JJ, Beven KJ (2004) Constraining

dynamic TOPMODEL responses for imprecise water table

information using fuzzy rule based performance measures.

J Hydrol 291(3–4):254–277

Gallart F, Latron J, Llorens P, Beven K (2007) Using internal

catchment information to reduce the uncertainty of discharge and

baseflow predictions. Adv Water Resour 30(4):808–823

Granlund K, Rankinen K, Lepisto A (2004) Testing the INCA model

in a small agricultural catchment in southern Finland. Hydrol

Earth Syst Sci 8(4):717–728

Haygarth PM, Warwick MS, House WA (1997) Size distribution of

colloidal molybdate reactive phosphorus in river waters and soil

solution. Water Res 31(3):439–448

Heathwaite AL, Fraser AI, Johnes PJ, Hutchins M, Lord E, Butterfield

D (2003) The phosphorus indicators tool: a simple model of

diffuse P loss from agricultural land to water. Soil Use Manage

19(1):1–11

Hough MN, Jones JA (1997) The United Kingdom Meterological

Office rainfall and evaporation calculations system: MORECS

version 2.0—an overview. Hydrol Earth Syst Sci 1(2):227–239

Irvine K, Mills P, Bruen M, Walley W, Hartnett M, Black A, Tynan S,

Duck R, Bragg O, Rowen J, Wilson J, Johnston P, O’Toole C

(2005) Water framework directive—the application of mathe-

matical models as decision-support tools (2002-W-DS–11),

Environmental Protection Agency, Johnstown Castle

Jansen MJW (1999) Analysis of variance designs for model output.

Comput Phys Commun 117(1–2):35–43

Jarvie HP, Neal C, Withers PJA (2006) Sewage-effluent phosphorus:

a greater risk to river eutrophication than agricultural phospho-

rus? Sci Total Environ 360(1–3):246–253

Jarvie HP, Withers PJA, Neal C (2002) Review of robust measure-

ment of phosphorus in river water: sampling, storage,

fractionation and sensitivity. Hydrol Earth Syst Sci 6(1):113–131

Johnes PJ (1996) Evaluation and management of the impact of land

use change on the nitrogen and phosphorus load delivered to

surface waters: the export coefficient modelling approach.

J Hydrol 183(3–4):323–349

Johnes PJ (2007) Uncertainties in annual riverine phosphorus load

estimation: impact of load estimation methodology, sampling

frequency, baseflow index and catchment population density.

J Hydrol 332(1–2):241–258

Johnes PJ, Butterfield D (2003) Export coefficient model runs for the

Hampshire Avon and the Herefordshire Wye catchments, based

on 1 km2 grid scale data from the 1995, Annual Agricultural

Census returns, Aquatic Environments Research Centre. Uni-

versity of Reading, Reading

Table 4 continued

Parameter Expert calibration Units

Porosity (applies to 22 reaches) 0.3 h

Macrophyte growth rate (applies to 22 reaches) 0 Day-1

Macrophyte death rate (applies to 22 reaches) 0 Day-1

Epiphyte growth rate (applies to 22 reaches) 0 Day-1

Epiphyte death rate (applies to 22 reaches) 0 Day-1

Bed sediment depth (applies to 22 reaches) 0.3 m

Effluent boron concentration (applies to 22 reaches) 0 mg l-1

Delta (applies to 22 reaches) 0 n/a

See (Wade et al. 2002a) for parameter definitions

Stoch Environ Res Risk Assess (2009) 23:991–1010 1009

123

Jordan P, Arnscheidt A, McGrogan H, McCormick S (2007)

Characterising phosphorus transfers in rural catchments using a

continuous bank-side analyser. Hydrol Earth Syst Sci 11(1):372–

381

Jordan P, Menary W, Daly K, Kiely G, Morgan G, Byrne P, Moles R

(2005) Patterns and processes of phosphorus transfer from Irish

grassland soils to rivers—integration of laboratory and catch-

ment studies. J Hydrol 304(1–4):20–34

Kirchner JW, Feng XH, Neal C, Robson AJ (2004) The fine structure

of water-quality dynamics: the (high-frequency) wave of the

future. Hydrological Process 18(7):1353–1359

Kronvang B, Bruhn AJ (1996) Choice of sampling strategy and

estimation method for calculating nitrogen and phosphorus

transport in small lowland streams. Hydrological Process

10(11):1483–1501

Krueger T, Freer J, Quinton JN, Macleod CJA (2007) Processes

affecting transfer of sediment and colloids, with associated

phosphorus, from intensively farmed grasslands: a critical note

on modelling of phosphorus transfers. Hydrological Process

21(4):557–562

Langan SJ, Wade AJ, Smart R, Edwards AC, Soulsby C, Billett MF,

Jarvie HP, Cresser MS, Owen R, Ferrier RC (1997) The

prediction and management of water quality in a relatively

unpolluted major Scottish catchment: current issues and exper-

imental approaches. Sci Total Environ 194:419–435

Limbrick KJ, Whitehead PG, Butterfield D, Reynard N (2000)

Assessing the potential impacts of various climate change

scenarios on the hydrological regime of the River Kennet at

Theale, Berkshire, south-central England, UK: an application

and evaluation of the new semi-distributed model, INCA. Sci

Total Environ 251:539–555

Liu SM, Brazier R, Heathwaite L (2005) An investigation into the

inputs controlling predictions froth a diffuse phosphorus loss

model for the UK; the Phosphorus Indicators Tool (PIT). Sci

Total Environ 344(1–3):211–223

May L, House WA, Bowes M, McEvoy J (2001) Seasonal export of

phosphorus from a lowland catchment: upper River Cherwell in

Oxfordshire, England. Sci Total Environ 269(1–3):117–130

McIntyre NR, Wheater HS (2004) Calibration of an in-river

phosphorus model: prior evaluation of data needs and model

uncertainty. J Hydrol 290(1–2):100–116

McIntyre N, Jackson B, Wade AJ, Butterfield D, Wheater HS (2005)

Sensitivity analysis of a catchment-scale nitrogen model.

J Hydrol 315(1–4):71–92

Neal C, Jarvie HP, Wade AJ, Neal M, Wyatt R, Wickham H, Hill L,

Hewitt N (2004a) The water quality of the LOCAR Pang and

Lambourn catchments. Hydrol Earth Syst Sci 8(4):614–635

Neal C, Skeffington R, Neal M, Wyatt R, Wickham H, Hill L, Hewitt

N (2004b) Rainfall and runoff water quality of the Pang and

Lambourn, tributaries of the River Thames, south-eastern

England. Hydrol Earth Syst Sci 8(4):601–613

Page T, Beven KJ, Whyatt D (2004) Predictive capability in

estimating changes in water quality: long-term responses to

atmospheric deposition. Water Air Soil Pollut 151(1–4):215–

244

Page T, Haygarth PM, Beven KJ, Joynes A, Butler T, Keeler C, Freer

J, Owens PN, Wood GA (2005) Spatial variability of soil

phosphorus in relation to the topographic index and critical

source areas: sampling for assessing risk to water quality.

J Environ Q 34(6):2263–2277

Pionke HB, Gburek WJ, Sharpley AN, Schnabel RR (1996) Flow and

nutrient export patterns for an agricultural hill-land watershed.

Water Resour Res 32(6):1795–1804

Raat KJ, Vrugt JA, Bouten W, Tietema A (2004) Towards reduced

uncertainty in catchment nitrogen modelling: quantifying the

effect of field observation uncertainty on model calibration.

Hydrol Earth Syst Sci 8(4):751–763

Radwan M, Willems P, Berlamont J (2004) Sensitivity and uncer-

tainty analysis for river quality modelling. J Hydroinformatics

6:83–99

Rankinen K, Karvonen T, Butterfield D (2006) An application of the

GLUE methodology for estimating the parameters of the INCA-

N model. Sci Total Environ 365(1–3):123–139

Rode M, Suhr U, Wriedt G (2007) Multi-objective calibration of a

river water quality model—information content of calibration

data. Ecol Model 204(1–2):129–142

Singh AP, Ghosh SK, Sharma P (2007) Water quality management of

a stretch of river Yamuna: an interactive fuzzy multi-objective

approach. Water Resour Manage 21(2):515–532

Spear RC, Hornberger GM (1980) Eutrophication in Peel Inlet II:

identification of crucial uncertainties via generalised sensitivity

analysis. Water Res 14:43–49

Svendsen LM, Kronvang B, Kristensen P, Graesbol P (1995) Dynamics

of phosphorus-compounds in a lowland river system—importance

of retention and nonpoint sources. Hydrological Process

9(2):119–142

van Griensven A, Meixner T (2006) Methods to quantify and identify

the sources of uncertainty for river basin water quality models.

Water Sci Technol 53(1):51–59

Wade AJ, Hornberger GM, Whitehead PG, Jarvie HP, Flynn N (2001)

On modeling the mechanisms that control in-stream phosphorus,

macrophyte, and epiphyte dynamics: an assessment of a new

model using general sensitivity analysis. Water Resour Res

37(11):2777–2792

Wade AJ, Whitehead PG, Butterfield D (2002a) The Integrated

Catchments model of Phosphorus dynamics (INCA-P), a new

approach for multiple source assessment in heterogeneous river

systems: model structure and equations. Hydrol Earth Syst Sci

6(3):583–606

Wade AJ, Whitehead PG, Hornberger G, Jarvie HP, Flynn N (2002b)

On modelling the impacts of phosphorus stripping at sewage works

on in-stream phosphorus and macrophyte/epiphyte dynamics: a

case study for the River Kennet. Sci Total Environ 282:395–415

Wade AJ, Whitehead PG, Hornberger GM, Snook DL (2002c) On

modelling the flow controls on macrophyte and epiphyte

dynamics in a lowland permeable catchment: the River Kennet,

southern England. Sci Total Environ 282:375–393

Wade AJ, Raat KJ, Butterfield D, Whitehead PG (2004) Effectiveness

of Eutrophication Control by Phosphorus Reduction: Develop-

ment of the INCA-P Model (Science Report: SC980009/SR)

Wade AJ, Butterfield D, Griffiths T, Whitehead PG (2007) Eutrophi-

cation control in river-systems: an application of INCA-P to the

River Lugg. Hydrol Earth Syst Sci 11(1):584–600

Zak S, Beven KJ (1999a) Equifinality, sensitivity and predictive

uncertainty in the estimation of critical loads. J Environ Q

236:191–214

Zak SK, Beven KJ (1999b) Equifinality, sensitivity and predictive

uncertainty in the estimation of critical loads. Sci Total Environ

236(1–3):191–214

1010 Stoch Environ Res Risk Assess (2009) 23:991–1010

123