A new wavelet―bootstrap―ANN hybrid model for daily discharge forecasting

22
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/236657120 A new waveletbootstrapANN hybrid model for daily discharge forecasting Article in Journal of Hydroinformatics · July 2011 DOI: 10.2166/hydro.2010.142 CITATIONS 36 READS 216 2 authors: Mukesh Kumar Tiwari College of Agricultural Engineering and Techno… 26 PUBLICATIONS 338 CITATIONS SEE PROFILE Chandranath Chatterjee Indian Institute of Technology Kharagpur 136 PUBLICATIONS 901 CITATIONS SEE PROFILE All content following this page was uploaded by Chandranath Chatterjee on 01 December 2016. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.

Transcript of A new wavelet―bootstrap―ANN hybrid model for daily discharge forecasting

Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/236657120

Anewwavelet―bootstrap―ANNhybridmodelfordailydischargeforecasting

ArticleinJournalofHydroinformatics·July2011

DOI:10.2166/hydro.2010.142

CITATIONS

36

READS

216

2authors:

MukeshKumarTiwari

CollegeofAgriculturalEngineeringandTechno…

26PUBLICATIONS338CITATIONS

SEEPROFILE

ChandranathChatterjee

IndianInstituteofTechnologyKharagpur

136PUBLICATIONS901CITATIONS

SEEPROFILE

AllcontentfollowingthispagewasuploadedbyChandranathChatterjeeon01December2016.

Theuserhasrequestedenhancementofthedownloadedfile.Allin-textreferencesunderlinedinbluearelinkedtopublicationsonResearchGate,lettingyouaccessandreadthemimmediately.

A new wavelet–bootstrap–ANN hybrid model

for daily discharge forecasting

Mukesh K. Tiwari and Chandranath Chatterjee

ABSTRACT

A new hybrid model, the wavelet–bootstrap–ANN (WBANN), for daily discharge forecasting is

proposed in this study. The study explores the potential of wavelet and bootstrapping techniques

to develop an accurate and reliable ANN model. The performance of the WBANN model is also

compared with three more models: traditional ANN, wavelet-based ANN (WANN) and bootstrap-

based ANN (BANN). Input vectors are decomposed into discrete wavelet components (DWCs)

using discrete wavelet transformation (DWT) and then appropriate DWCs sub-series are used as

inputs to the ANN model to develop the WANN model. The BANN model is an ensemble of several

ANNs built using bootstrap resamples of raw datasets, whereas the WBANN model is an

ensemble of several ANNs built using bootstrap resamples of DWCs instead of raw datasets. The

results showed that the hybrid models WBANN and WANN produced significantly better results

than the traditional ANN and BANN, whereas the BANN model is found to be more reliable and

consistent. The WBANN and WANN models simulated the peak discharges better than the ANN

and BANN models, whereas the overall performance of WBANN, which uses the capabilities of

both bootstrap and wavelet techniques, is found to be more accurate and reliable than the

remaining three models.

Key words 9999 bootstrap, discharge, forecasting, resampling, wavelet

INTRODUCTION

Daily discharge forecasting is desirable for water resource

planning and management. A wide variety of rainfall–runoff

models have been developed and applied for discharge fore-

casting. Most of the rainfall–runoff models have been devel-

oped either based on a mechanistic approach or on a systems

theoretic approach. Hydrological prediction usually relies on

incomplete and uncertain process descriptions that have been

deduced from sparse and noisy datasets. Spatially distributed

modeling is a typical example of the mechanistic approach

to construct a model that explicitly accounts for as much of

the small-scale physics and the natural heterogeneity as

computationally possible (Loague & VanderKwaak 2004).

The approach has been criticized for resulting in models

that are overly complex, leading to problems of over-

parameterization and equifinality (Beven 2006), which may

manifest itself in large prediction uncertainty (Uhlenbrook

et al. 1999), while in the system theoretic approach the

concern is with the system operation, not the nature of the

system by itself or the physical laws governing its operation.

Black box models in the form of artificial neural networks

(ANNs) have gained momentum in the last few decades for

river flow forecasting. The FFBP (Feed Forward Back Pro-

pagation) is the most popular ANN training method in water

resources literature. The major advantage of the FFBP ANN

is that it is less complex than other ANNs such as Radial

Basis Function (RBF) and has the same nonlinear input–

output mapping capability (Sudheer & Jain 2003; Coulibaly &

Evora 2007). Some researchers have successfully applied

fuzzy inference systems in river flow forecasting (Nayak

et al. 2004, 2005a; Mukerji et al. (2009); Pramanik & Panda

Mukesh K. TiwariChandranath Chatterjee (corresponding author)Agricultural and Food Engineering Department,Indian Institute of Technology,Kharagpur 721 302,IndiaE-mail: [email protected]

doi: 10.2166/hydro.2010.142

& IWA Publishing 2011500 Journal of Hydroinformatics 9999 9999 201113.3

2009). Jacquin & Shamseldin (2009) reviewed the existing

studies dealing with the use of fuzzy inference systems in river

flow forecasting. This review shows that fuzzy inference

systems can be used as effective tools for river flow forecast-

ing, even though their application is rather limited in com-

parison to the popularity of neural network models. They

found that there are several unresolved issues requiring

further attention before more clear guidelines for the applica-

tion of fuzzy inference systems can be given. Wu & Chau

(2006) observed that the hybrid GA-based ANN algorithm,

under cautious treatment to avoid over-fitting, is able to

produce better accuracy in performance, although at the

expense of additional modeling parameters and longer com-

putation time. ANNs are known to have several dozens of

successful applications in river basin management and related

problems (Solomatine & Ostfeld 2008). The ANNs are still

widely applied and are a very popular tool compared to other

data-driven techniques in river basin management. Therefore

in this study we have used the FFBP ANN model as it has

been increasingly used in rainfall–runoff modeling and river

discharge forecasting due to their ability to map complex

nonlinear rainfall–runoff relationships (Baratti et al. 2003;

Cigizoglu 2003a, b; Jain & Srinivasulu 2004; Altunkaynak

2007; Demirel et al. 2009). Substantial literatures on ANN

have been reported in ASCE (2000a, b).

The reliability of the hydrological predictions is affected

by three major sources of uncertainties (Bates & Townley

1988): data uncertainty (quality and representativeness of

data), model structure uncertainty (ability of the model to

describe the catchment’s response) and parameter uncer-

tainty (adequate values of model parameters). Han et al.

(2007) studied the uncertainties involved in real-time predic-

tion in using an ANN model. It was concluded that, for long-

term predictions, the ANN showed superior performance but

that was only probabilistic, depending on how the calibration

and test events were arranged. Srivastav et al. (2007) pro-

posed a method of uncertainty analysis for ANN hydrological

models and showed that the ANN predictions contain a

significant amount of uncertainty. Boucher et al. (2009)

analyzed one-day-ahead hydrological ensemble forecasts

obtained by stacked neural networks and found that ensem-

ble forecasts outperform point forecasts. In order to over-

come the limitations inherent in the conventional treatment

of uncertainty in ANN model predictions, the recent trend

has been to combine the outputs of several member bootstrap

ANN models to reduce the uncertainty involved by control-

ling the generalization of the final predictive model and to

produce more reliable and consistent predictions (Cannon &

Whitfield 2002; Jeong & Kim 2005).

The bootstrap is a computational procedure that uses

intensive resampling with replacement, in order to reduce

uncertainty (Efron & Tibshirani 1993). In addition, it is the

simplest approach since it does not require complex compu-

tations of derivatives and Hessian matrix inversion involved

in linear methods or the Monte Carlo solutions of the

integrals involved in the Bayesian approach (Dybowski &

Roberts (2001)). Applications of the bootstrap method range

from estimating means, confidence intervals and parameter

uncertainties to network design techniques (Lall & Sharma

1996; Sharma et al. 1997; Tasker & Dunne 1997). The boot-

strap technique has also been used in artificial neural network

model development. Abrahart (2003) employed the bootstrap

technique to continuously sample the input space in the

context of rainfall–runoff modeling and reported that it

offered marginal improvement in terms of greater accuracies

and better global generalizations. Anctil & Lauzon (2004)

recommended the joint use of stop training or Bayesian

regularization with either bagging or boosting for improve-

ment and stability in modeling performance. Jeong & Kim

(2005) used ensemble neural networks (ENN) using the boot-

strap technique to simulate monthly rainfall–runoff. They

concluded that ENN is less sensitive to the input variable

selection and the number of hidden nodes than the single

neural network (SNN). Jia & Culver (2006) used the boot-

strap technique to estimate the generalization errors of neural

networks with different structures and to construct the con-

fidence intervals for synthetic flow prediction with a small

data sample. Sharma & Tiwari (2009) applied the bootstrap

technique for monthly runoff prediction and showed that the

BANN model consistently outperformed the ANN models.

Tiwari & Chatterjee (2010) applied the bootstrap technique

for hourly flood forecasting and showed that the bootstrap

technique is capable of quantifying uncertainty in flood

forecasts and ensemble predictions are more stable and

accurate.

A major criticism of ANN models is their limited ability to

account for any physics of the hydrologic processes in a

catchment (Aksoy et al. 2007; Koutsoyiannis (2007)). Daily

501 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

discharge is widely perceived as being nonlinear and nonsta-

tionary (Rao et al. 2003; Wang et al. 2006). Nonstationarity in

the data such as trends and seasonal variations influences the

simulation of daily discharge greatly and often causes poor

predictability in practical applications. The pronounced sea-

sonality is driven by different underlying physical processes of

streamflow generation for different periods. For instance, the

low flows are mainly sustained by base flows, while high

flows are affected by intensive rainfalls.

Wavelet transforms, which provide information in both

time and frequency domains of the signal, give considerable

information about the physical structure of the data, and

wavelet analysis provides a time–frequency representation

of a signal at many different periods in the time domain

(Daubechies 1990). In spite of the suitable flexibility of ANN

in modeling hydrologic time series such as runoff (Hsu et al.

1995; Zhang & Dong 2001) non-stationarity in the signal limits

the use of ANN. In such a situation, ANNs will not produce

good results with nonstationary data if pre-processing of the

input and/or output data is not performed (Cannas et al.

2006). The wavelet transformation technique decomposes the

original time series data into sub-time series of different

resolution levels, and these wavelet-transformed data

improve the performance of a forecasting model by capturing

different information on various resolution levels. The sub-

time series obtained using wavelet decomposition of the

signal of different resolution levels provides an interpretation

of the series structure and extracts the useful information.

This is the reason that this technique is largely applied to time

series analysis of nonstationary signals (Nason & Von Sachs

1999). In the recent past, wavelet transforms have become a

useful method for analyzing variations, periodicities and

trends in time series (Xingang et al. 2003; Yueqing et al.

2004; Partal & Kucuk 2006). Recently, the WANN model

has been successfully employed in some hydrology and water

resource studies. For example, Wang & Ding (2003) proposed

a wavelet network model with a combination of the wavelet

transform and the ANN, and decomposed the original time

series into periodic components by wavelet transform. Later,

sub-time series were used as the inputs for ANNs and the

resulting model was applied to forecast the original time

series. This approach was used for monthly groundwater

level and daily discharge forecasting. Kim & Valdes (2003)

presented a hybrid neural network model combined with

dyadic wavelet transforms to improve the forecasting of

regional droughts. These researchers used the neural network

model in two phases. First, neural networks were employed

to forecast the signals decomposed by wavelet transform at

various resolution levels and then the forecasted decomposed

signals were reconstructed into the original time series. The

researchers applied this model to the monthly and annual

inflow and rainfall data and showed that the model signifi-

cantly improved the neural network’s forecasting perfor-

mance. Anctil & Tape (2004) used a wavelet–neural

network model for one-day-ahead rainfall–runoff forecasting.

The time series was decomposed by wavelet transform into

three sub-series: short, intermediate and long wavelet periods.

Then, multiple-layer ANN forecasting models were trained

for each wavelet-decomposed sub-series, and later forecasted

decomposed signals were reconstructed into the original time

series. Partal & Cigizoglu (2008) predicted the suspended

sediment load in rivers by a combined wavelet–ANN method.

Measured data were decomposed into wavelet components

via DWT, and the new wavelet series, consisting of the sum of

selected wavelet components, was used as input for the ANN

model. The wavelet–ANN model provided a good fit to

observed data for the testing period. Zhou et al. (2008)

developed a wavelet predictor–corrector model for prediction

of monthly discharge time series and showed that the model

has higher prediction accuracy than ARIMA and seasonal

ARIMA. Hence a hybrid ANN–wavelet model which uses

multi-scale signals as input data may present more reliable

forecasting rather than a single pattern input. Adamowski

(2008a, b) developed a new method of standalone short-term

river flood forecasting based on wavelet and cross-wavelet

analysis. He found that the proposed wavelet-based forecast-

ing method can be used with great accuracy as a standalone

forecasting method for short-term river flood forecasting. It

was also shown that forecasting models based on wavelet and

cross-wavelet constituent components for forecasting river

floods were not accurate for longer lead time forecasting with

the artificial neural network models providing more accurate

results. Kisi (2008) investigated the accuracy of the neuro-

wavelet (NW) technique for modeling monthly streamflows.

The NW and ANN models were tested by applying them to

various input combinations of monthly streamflow data from

Gerdelli and Isakoy Stations in the Eastern Black Sea region

of Turkey. The test results indicated that the DWT could

502 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

increase the accuracy of ANN models in modeling monthly

streamflows. The comparison results showed that the NW

model performed much better than the ANN, multi-linear

regression (MLR) and autoregressive (AR) models. He

recommended further studies using more data from different

areas to strengthen these conclusions. Kisi (2009) proposed

the application of a conjunction model (neurowavelet) for

forecasting daily intermittent streamflow. The comparison

results revealed that the suggested model could significantly

increase the forecast accuracy of single ANN in forecasting

daily intermittent streamflows. Partal (2009) employed the

wavelet–neural network structure that combines wavelet

transform and artificial neural networks to forecast the river

flows of Turkey. He studied the performance comparison of

generalized neural networks and radial basis neural networks

with feed-forward back-propagation methods. Six different

models were studied for forecasting of monthly river flows. It

was seen that the wavelet and feed-forward back-propagation

model was superior to the other models in terms of selected

performance criteria. Satyajirao & Krishna (2009) proposed a

wavelet–neural network (WNN) model, based on a combina-

tion of wavelet analysis and ANN, in order to improve the

precision of daily streamflow and monthly groundwater level

forecasts where there is a scarcity of other hydrological time

series data. The results of daily streamflow and monthly

groundwater level series modeling indicated that the perfor-

mances of WNN models are more effective than the ANN

models. The study also highlights the capability of WNN

models in estimating low and high values in the hydrological

time series data. Wang & Meng (2007) analyzed character-

istics, abruptions, tendencies and causes of annual runoff

variations in the Heihe River drainage basin by the wave-

let–neural network model, and detected the basic variation

characteristics and tendencies of the Heihe River surface

hydrological system. Wang et al. (2009) developed a new

hybrid model, the wavelet–network model (WNM), based on

reasonable combination of wavelet analysis with ANN. The

results from a given case study showed that the WNM has

some significant advantages for short- and long-term predic-

tion of runoff in hydrology. The WNM was compared with

the threshold auto-regressive model (TAR). The accuracy of

prediction using the WNM was better than the TAR. They

recommended further researches on investigating the method

for reasonable selection of resolution level, which is an

important parameter in WNM. Partal & Cigizoglu (2009)

predict the daily precipitation from meteorological data

from Turkey using the wavelet–neural network method. The

new approach in estimating the peak values showed a notice-

ably high positive effect on the performance evaluation

criteria.

To the best of our knowledge, no study has been reported

in the hydrologic literature that has used the combined

strength of wavelet and bootstrap-based ANNs for hydrolo-

gical modeling. The present study is the first application

where ANN models are coupled with DWT and the bootstrap

technique to utilize the individual strength of each approach.

An attempt is made to forecast river discharge for 1 to 5 days

using the novel approach that uses wavelet and bootstrap-

based ANNs in a large river basin where flood forecasting

and water resources planning are critical issues.

METHODOLOGY

Artificial neural networks

Artificial neural networks are information processing systems

composed of simple processing elements (nodes) linked by

weighted synaptic connections (Muller & Reinhardt 1991).

Neural network models are developed by training the net-

work to represent the relationships and processes that are

inherent within the data. Being essentially nonlinear regres-

sion models, they perform an input–output mapping using a

set of interconnected simple processing nodes or neurons.

They reconstruct the complex nonlinear input/output rela-

tions by combining multiple simple functions, by analogy with

the functioning of the human brain. This approach is fast and

robust in noisy environments, flexible in the range of pro-

blems it can solve, and highly adaptive to newer environ-

ments. Owing to these established advantages, ANNs

currently have numerous real-world applications, such as

time series prediction, rule-based control and rainfall–runoff

modeling (Jain et al. 1999). The multilayer feed-forward

neural network consists of a set of sensory units that con-

stitute the input layer, one or more hidden layers of computa-

tion nodes and an output layer of computation nodes. The

input signal propagates through the network in a forward

direction, layer by layer. These neural networks are commonly

503 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

referred to as multilayer perceptrons. A detailed explanation

of different properties of ANN are beyond the scope of this

paper. Interested readers are directed to refer to texts such as

Bishop (1995) and Haykin (1999) for discussion on the general

properties of ANN and Maier & Dandy (2000) for an over-

view of different applications of ANN in water resources.

The wavelet analysis

Wavelet analysis is a multiresolution analysis in the time and

frequency domains and is the important derivative of the

Fourier transform. The wavelet function cðtÞ, called the

mother wavelet, has finite energy and is mathematically

defined asRþN�N cðtÞdt ¼ 0.

ca;bðtÞ can be obtained as

ca;bðtÞ ¼ aj j�12c

t� ba

� �ð1Þ

where a and b are real numbers, ca;bðtÞ ¼ successive wavelet,

a¼ scale or frequency factor and b¼ time factor. Thus, the

wavelet transform is a function of two variables, a and b. The

parameter a can be interpreted as a dilation (a41) or con-

traction (ao1) factor of the wavelet function cðtÞ corre-

sponding to different scales of observation. The parameter b

can be interpreted as a temporal translation or shift of the

function cðtÞ.For the time series fðtÞeL2ðRÞ or finite energy signal

(Rosso et al. 2004) the continuous wavelet transform of the

time series f(t) is defined as

Wfða; bÞ ¼ aj j�12

Z þN�N

fðtÞc� t� ba

� �dt ð2Þ

where Wf (a, b) is the wavelet coefficient and * corresponds

to the complex conjugate.

The wavelet transform searches for correlations between

the signal and wavelet function. This calculation is done at

different scales of a and locally around the time of b. The

result is a wavelet coefficient Wf(a, b) contour map known as

a scalogram. However, computing the wavelet coefficients at

every possible scale (resolution level) causes the generation of

a large amount of data. If one chooses the scales and the

positions based on the powers of two (dyadic scales and

positions), then the analysis will be much more efficient as

well as more accurate. This transform is called the discrete

wavelet transform (DWT) and can be defined as (Mallat 1989)

cm;nt� b

a

� �¼ a�m=2

o c�t� nboam

o

amo

� �ð3Þ

where m and n are integers that control the wavelet dilation

and translation, respectively, a0 is a specified fined dilation

step greater than 1 and b0 is the location parameter which

must be greater than zero. The most common and simplest

choices for parameters are a0¼ 2 and b0¼ 1.

This power-of-two logarithmic scaling of the translations

and dilations is known as a dyadic grid arrangement and is

the simplest and most efficient case for practical purposes

(Mallat 1989). For a discrete time series f(t), when it occurs at

a different time t (i.e. here integer time steps are used), the

discrete wavelet transform becomes

Wfðm;nÞ ¼ 2�m=2XN�1

t¼0

fðtÞc�ð2�mi� nÞ ð4Þ

where Wf(m, n) is the wavelet coefficient for the discrete

wavelet of scale a¼ 2m and location b¼ 2mn. f(t) is a finite

time series (t¼ 0, 1, 2,y, N�1), N is an integer power of 2

(N¼ 2M) and n is the time translation parameter, which

changes in the range 0ono2M�m�1, where 1omoM. At the

largest wavelet scale (i.e. 2m where m¼M) only one wavelet is

required to cover the time interval and only one coefficient is

produced. At the next scale (2m�1), two wavelets cover the time

interval, hence two coefficients are produced, and so on down

to m¼ 1. At m¼ 1, the a scale is 21, i.e. 2M�1 or N/2 coefficients

are required to describe the signal at this scale. The total

number of wavelet coefficients for a discrete time series of

length N¼ 2M is then 1þ 2þ 4þ 8 þ y þ 2M�1 ¼ N�1.

In addition to this, a signal smoothed component, W , is

left, which is the signal mean. Thus, a time series of length N

is broken into N components, i.e. with zero redundancy. The

inverse discrete transform is given by

fðtÞ ¼W þXMm¼1

X2M�m�1

n¼0

Wfðm;nÞ2�m2c�ð2�mi� nÞ ð5Þ

or in a simple format as

fðtÞ ¼WðtÞ þXMm¼1

WmðtÞ ð6Þ

504 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

where WðtÞ is called the approximation sub-signal at level M

and Wm(t) are detailed subsignals at levels m¼ 1, 2, y,M.

DWT operates two sets of functions viewed as high-pass

and low-pass filters (Figure 1). The original time series is

passed through high-pass and low-pass filters and separated

at different scales. The time series is decomposed into one

comprising low frequencies or its trend (the approximation)

and one comprising the high frequencies or the fast events

(the detail). The wavelet coefficients, Wm(t) (m ¼ 1, 2, y, M),

provide the detail signals, which can capture small features of

interpretational value in the data; the residual term, WðtÞ,represents the background information (approximation) of

data. Because of the simplicity of W1(t), W2(t), y, WM(t),

WðtÞ, some interesting characteristics, such as period, hidden

period, dependence and jump can be diagnosed easily

through these discrete wavelet components (DWCs).

Bootstrapped artificial neural networks (BANNs)

The bootstrap is a data-driven simulation method that uses

intensive resampling, with replacement, to reduce uncertain-

ties (Efron 1979; Efron & Tibishirani, 1993). The technique is

based on resampling with replacement of the available data-

set and training an individual network on each resampled

instance of the original dataset. The bootstrap method can be

used to expand upon a single realization of a distribution or

process to create a set of bootstrap samples that can provide a

better understanding of the average and variability of the

original unknown distribution or process.

Assume that the data consists of a random sample

Tn¼ {(x1, y1), (x2, y2), y., (xn, yn)} of size n drawn from a

population of unknown probability distribution F, where ti (xi,

yi) is a realization drawn independently and identically dis-

tributed (i.i.d.) from F and consists of a predictor vector xi and

the corresponding output variable yi. Let F be the empirical

distribution function for Tn with mass 1/n on t1, t2, y, tn; and

let T�

be a random sample of size n taken from i.i.d. with

replacement from F. The set of B bootstrap samples can be

represented as T1, T2,y, Tb,y, TB, in which B is the total

number of bootstrap samples and ranges usually from 50–200

(Efron 1979). For each Tb, an ANN prediction model is

constructed and the output is represented as

fANNðxi;wb=TbÞ, built using all n observations. The perfor-

mance of the trained ANN model is evaluated using the

observation pairs that are not included in a bootstrap sample

and the average performance of these ANNs on their corre-

sponding validation sets is used as an estimate of the general-

ization error of the ANN model developed on Tn. The

generalization error of an ANN model can be estimated by

its ‘E0’ estimate (Twomey & Smith 1998):

E0 ¼

PBb¼1

PieAbðyi � fANNðxi;wb=TbÞÞ2

PBb¼1ðAbÞ

ð7Þ

The output of the ANN developed based on the bootstrap

sample Tb is represented as fANN(x, wb/Tb), where x is a

particular input vector, wb is the weight vector, Ab is the set of

indices for the observation pairs not included in the bootstrap

sample Tb and #(Ab) is the number of observation pair indices

in Ab. The BANN estimate y xð Þ is given by the average of the

B bootstrapped estimates:

y xð Þ ¼ 1B

XBb¼1

fANN x;wb=Tb

� �ð8Þ

Multiple linear regression (MLR)

Multiple linear regression attempts to model the relationship

between two or more independent variables and a dependent

variable by fitting a linear equation to the data points. A

multiple linear regression equation takes the following form:

y ¼ aþ b1�1 þ b2�2 þyþ bnxn ð9Þ

where y is the dependent variable, a is a constant and b1 to bn

are multipliers for x1 to xn independent variables. Constant

and multipliers are estimated through minimizing the sums of

Time series

High-pass filter

Low-pass filter

Details

Approximation

Figure 1 9999 DWT decomposition of a time series.

505 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

squares of deviations between each data point and the

regression line. MLR has been the traditional approach

utilized in water resources hydrology for several decades

since the last century. Some recent applications appear in

Leclerca & Ouarda (2007) and Sahoo et al. (2009).

Performance indices

Dawson et al. (2007) discussed 20 performance measures

generally used in hydrological forecasting. In this study, we

have selected four performance measures to evaluate model

performance. The Nash–Sutcliffe efficiency (E), root mean

square error (RMSE), mean absolute error (MAE) and per-

sistence index (PERS) performance measures are used to

evaluate the accuracy of the developed models. The Nash–

Sutcliffe efficiency (E), introduced by Nash & Sutcliffe (1970),

is still one of the most widely used criteria for the assessment

of model performance. E provides a measure of the ability of

a model to predict values that are different from the mean. It

records as a ratio the level of overall agreement between the

observed and modelled datasets and, as it is sensitive to

differences in the observed and modelled means and var-

iances, it has been strongly recommended (e.g. NERC 1975;

ASCE 1993). RMSE and MAE provide different types of

information about the predictive capabilities of the model.

The RMSE measures the goodness-of-fit relevant to high flow

values whereas the MAE is not weighted towards high(er)

magnitude or low(er) magnitude events, but instead evaluates

all deviations from the observed values, in both an equal

manner and regardless of sign. PERS is the substitution of the

last known value as the current prediction and represents a

good benchmark against which other predictions can be

measured (Cannas et al. 2006; Kitanidis & Bras 1980).

(i) Nash–Sutcliffe coefficient (E). It is expressed as

E ¼ 1�

Pni¼1ðOi � PiÞ2

Pni¼1ðOi �OiÞ2

ð10Þ

where Oi and Pi are the observed and predicted flow, Oi is the

mean of the observed flow and n is the number of data points.

The value of the Nash–Sutcliffe coefficient varies between

�N to 1. The closer the value to 1, the better is the model

performance.

(ii) Root mean square error (RMSE). It is expressed as

RMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1n

Xn

i¼1

ðOi � PiÞ2vuut ð11Þ

(iii) Mean absolute error (MAE). It is expressed as

MAE ¼ 1n

Xn

i¼1

Oi � Pij j ð12Þ

(iv) Persistence index (PERS). It is expressed as

PERS ¼ 1� SSESSEnaive

ð13Þ

where

SSE ¼Xn

i¼1

ðOi � PiÞ2 ð14Þ

and

SSEnaive ¼Xn

i¼1

ðOi �Oi�LÞ2 ð15Þ

in which the SSE terms are the sum of square errors, Oi�L is

the observed discharge at time i minus the lead time L.

Persistence consists of a comparison between the model

under study and the naive model where a value of PERS

smaller than or equal to 0 indicates that the model under

study performs worse or no better than the easy-to-implement

naive model. A PERS value of 1 is obtained when the model

under study provides exact estimates of observed discharge.

STUDY AREA AND DATA USED

The Mahanadi river basin, which is the fourth largest river

basin in India, was selected for this study. The Mahanadi

River flows to the Bay of Bengal in east-central India draining

an area of 141 589 km2 and has a length of 851 km. It lies

between east longitudes 801300 to 861500 and north latitudes

191210 to 231350. About 53% of the basin is in the state of

Chhatisgarh, 46% is in the coastal state of Orissa and the

remainder of the basin is in the states of Jharkhand and

Maharastra. Numerous dams, irrigation projects and barrages

506 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

are present in the Mahanadi river basin, the most prominent

of which is the Hirakud reservoir. The middle reaches of the

lower Mahanadi river basin, located in Orissa between 821E

191N and 861E 221N and encompassing a geographical area

of 47 558.6 km2, forms the study area (Figure 2). The main

river reach extends from Hirakud Dam to Naraj, having a

total length of 358.4 km. The main soil types found in the

study area are red and yellow soils. The normal annual

rainfall is 1458 mm and temperature in this region varies

from 14 1C to 40 1C. The average monthly pan evaporation of

the area varies from 2.4 mm to 14.6 mm. Most of the rainfall

and river flow occur during the monsoon season, between

June and September. In the delta region of the Mahanadi

river basin almost every year flooding is a serious problem

during monsoon seasons. Naraj, which is situated at the

mouth of the delta, was selected for daily discharge forecasting.

The data used for this study consists of daily discharge

data of seven upstream gauging stations during the period

2000–2006. Some of the statistical properties of the discharge

data are presented in Table 1. The locations of different

gauging stations are shown in Figure 2. Seven years of daily

discharge data of seven discharge gauging stations for

the monsoon period (17 July to 25 September) from years

2000–2006 (497 sets) are divided into three datasets. Daily

discharge of years 2000–2004 (355 patterns) are taken for

training, 2005 (71 patterns) for cross-validation and 2006 (71

patterns) for testing. The training dataset serves for model

training and the testing dataset is used to evaluate the

performances of the models. The cross-validation set helps

to implement an early stopping approach in order to avoid

overfitting of the training data.

MODEL DEVELOPMENT

One of the most important steps in the ANN hydrologic

model development process is the determination of signifi-

cant input variables. The current study used a statistical

approach suggested by Sudheer et al. (2002) to identify the

appropriate input vectors. The method is based on the heur-

istic that the potential influencing variables corresponding to

different time lags can be identified through statistical analy-

sis of the data series that uses cross-correlation (CCF), auto-

correlation (ACF) and partial autocorrelation functions

(PACF) between the variables. This process is applied to

select significant inputs from the discharge data of seven

Figure 2 9999 Index map of the middle reaches of Mahanadi river basin showing location of different gauging stations.

507 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

discharge gauging stations for daily river flow forecasting. The

total input vectors identified are presented in Table 2.

The identification of the optimal network geometry is one

of the major tasks in developing an ANN model. While the

number of input output nodes is problem-dependent, there is

no direct and precise way of determining the optimal number

of hidden nodes. The model architecture of an ANN is

generally selected through a trial-and-error procedure as

currently there is no established methodology for selecting

the appropriate network architecture prior to training. The

optimum numbers of hidden neurons are calculated using the

generalization error. The generalization errors (Equation (7))

of various ANN structures are representative of the ANN

performance (Jia & Culver 2006). The ANN structures are

tested for 1–6 hidden neurons and the structure with 4 hidden

neurons for which the generalization error is lowest is chosen

as the optimal structure.

Four ANN models developed in this study are traditional

ANN, wavelet-based ANN (WANN), bootstrap-based ANN

(BANN) and bootstrap–wavelet ANN conjunction model

(BWANN). At first, a multi-layer perceptron (MLP) feed-

forward ANN model is developed. The ANN models are

Table 1 9999 Statistics of the datasets for daily river flow forecasting

Year Statistics Khairmal Hirakud Release Salebhata Kesinga Kantamal Tikarpara Naraj

2000 Mean (m3/s) 1792.9 784.9 25.0 359.8 569.5 2034.8 2296.9

Standard deviation (m3/s) 897.7 701.6 19.2 367.0 645.9 834.5 641.6

Maximum (m3/s) 5720.0 4601.3 101.2 2452.0 3661.0 4774.8 5050.0

Minimum (m3/s) 509.7 132.9 4.3 113.5 105.0 600.0 1090.0

2001 Mean (m3/s) 6983.8 4057.8 269.2 1049.5 1627.8 6035.1 9932.2

Standard deviation (m3/s) 6383.9 4401.3 431.6 940.6 1187.3 5568.9 9031.4

Maximum (m3/s) 30,468.9 21,567.3 3225.0 5293.0 6500.0 26,700.0 36,714.7

Minimum (m3/s) 894.8 474.5 8.6 116.5 336.7 1369.6 1842.1

2002 Mean (m3/s) 2230.1 1187.6 141.1 253.0 470.0 2206.6 3111.9

Standard deviation (m3/s) 2801.1 1964.7 243.9 327.1 579.5 2412.0 3583.7

Maximum (m3/s) 14,724.8 10,329.4 1545.4 1990.9 2900.0 12,305.6 16,630.3

Minimum (m3/s) 305.8 43.2 1.8 10.0 20.2 436.8 215.1

2003 Mean (m3/s) 9038.2 4887.2 664.5 1256.2 1645.7 7533.6 11,701.5

Standard deviation (m3/s) 8322.0 5306.2 1154.0 1327.7 2061.4 6445.3 10,014.6

Maximum (m3/s) 34,150.1 24,267.2 7916.0 8908.4 12,915.1 25,062.0 35,253.2

Minimum (m3/s) 1257.3 471.8 4.5 308.6 385.2 794.4 2620.3

2004 Mean (m3/s) 4581.7 2613.9 190.5 725.0 946.6 4208.4 5513.7

Standard deviation (m3/s) 4620.5 3240.1 236.2 633.9 687.2 3844.7 5452.1

Maximum (m3/s) 20,331.5 14,952.0 1004.0 3240.6 4014.0 17,744.4 22,465.4

Minimum (m3/s) 1121.3 538.5 13.6 182.6 345.5 1139.1 1320.7

2005 Mean (m3/s) 5570.8 3323.0 150.8 665.1 1001.2 4950.3 8229.0

Standard deviation (m3/s) 5565.3 3712.7 284.9 1210.2 1783.1 4141.0 7006.9

Maximum (m3/s) 26,249.7 13,196.6 1924.0 8120.8 11,030.1 19,000.0 24,399.1

Minimum (m3/s) 849.5 393.9 1.0 113.4 131.8 1130.0 1274.9

2006 Mean (m3/s) 7608.3 3343.9 573.8 1274.3 2138.1 7127.6 11,979.8

Standard deviation (m3/s) 5745.6 3036.7 836.8 1307.5 2985.1 5449.5 8737.6

Maximum (m3/s) 27,099.2 11,870.3 4681.4 6483.1 15,276.2 29,000.0 33,979.2

Minimum (m3/s) 1659.4 240.7 9.3 167.9 305.6 923.8 696.7

508 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

developed using the most significant inputs which are first

log-transformed and then linearly scaled to the range (0, 1)

for ANN modeling (Campolo et al. 1999). The computational

efficiency of the training process is an important considera-

tion for ANN modeling. A computationally efficient second-

order training method, the Levenberg–Marquardt method, is

used to minimize the mean squared error between the fore-

cast and observed river flows. In the next step, the DWC data

are used as inputs to the ANN model to develop the WANN

model. The DWT decomposes an original discharge time

series into many components (i.e. DWCs) at different scales

(or frequencies). Each component plays a distinct role in the

original flow series. The low-frequency component generally

reflects the identity (periodicity and trend) of the signal

whereas the high-frequency component uncovers details

(Kucuk & Agiralloglu 2006). The wavelet function is derived

from the family of Daubechies wavelets (Nourani et al. 2009;

Wu et al. 2009). To choose the number of decomposition level

or DWCs the following formula is used to determine the

decomposition levels (Aussem et al. 1998; Nourani et al.

2008):

L ¼ int ½logðNÞ� ð16Þ

where L and N are decomposition level and number of time

series data, respectively. This study uses N¼ 497, which

produces L¼ 3 approximately. The WANN models are devel-

oped employing subseries DWCs obtained using DWT. For

this purpose, firstly, the original time series is decomposed

into three levels of DWCs by DWT. Three levels of decom-

position (i.e., D1, D2 and D3) and approximation (A3) for

discharge data of Khairmal and Hirakud releases are shown

in Figure 3. The effective DWCs are determined using the

correlation coefficients between each wavelet components

and the observed discharge at Naraj. Correlation between the

periodic component and the original discharge data reveals

the effectiveness of the component. Table 3 shows that the

significant periodic components are A3 and D3 for Khairmal,

Hirakud release data and Kantamal; A3, D2 and D3 for

Tikarpara; A3, D1, D2 and D3 for Naraj; and only approx-

imation A3 for Salebhata and Kesinga. These constitute the

new wavelet discharge series. The significant wavelet compo-

nents of a particular gauging station are added and employed

to constitute the new inputs of the WANN for daily discharge

forecasting. The newly constructed time series is used as

WANN model inputs. The BANN model is developed as an

ensemble of several ANNs built using bootstrap resamples of

raw datasets, whereas the WBANN model is developed as an

ensemble of several ANNs built using bootstrap resamples of

DWCs instead of raw datasets. In this way the WBANN

model uses the capabilities of both bootstrap resampling and

wavelet transformation technique. Similar to the BANN

model, the WBANN model is also developed using 200

resamples to maintain consistency. Bootstrap.xla, an Excel

add-in (Barreto & Howland 2006), is used to generate boot-

strap resamples of raw datasets and DWCs for the BANN and

WBANN model development, respectively. A simple flow-

chart depicting the development of ANN, BANN, WANN

and WBANN models is shown in Figure 4. The number of

inputs for all four models is taken as the same to maintain

consistency. Similar to ANN the WANN, BANN and

WBANN model structures are tested for 1–6 hidden neurons

and all three models with 4 hidden neurons for which the

generalization errors are minimum is chosen as the optimal

structure. Therefore, for all four models the number of hidden

neurons is 4.

DAILY DISCHARGE FORECASTING

Figure 5–7 show the hydrograph and scatter plot of observed

and predicted discharges using WBANN for 1-, 3-, and 5-day

lead time forecasts for the testing period. The figures show

that WBANN forecasts are in good agreement with the

observed data. For 1-day lead time forecasts the observed

and predicted values are in good agreement as the predicted

Table 2 9999 Most significant input variables selected using cross-correlation statistics (CCF,

ACF, PACF)

Stations Lags (d)

Naraj Qt�1, Qt�2

Tikarpara Qt�1, Qt�2

Khairmal Qt�1, Qt�2, Qt�3

Kantamal Qt�2, Qt�3

Hirakud Release Qt�1, Qt�2, Qt�3

Salebhata Qt�1, Qt�2, Qt�3

Kesinga Qt�2, Qt�3

509+ M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

0

8000

16000

24000

32000

Dis

char

ge

(m/s

)

Time (day)

Original

0

4000

8000

12000

16000

20000

Dis

char

ge

(m/s

)

Time (day)

A3

−16000

−8000

0

8000

16000

Dis

char

ge

(m/s

)

Time (day)

D1

−12000

−8000

−4000

0

4000

8000

12000

Dis

char

ge

(m/s

)

Time (day)

D2

−4000

0

4000

Dis

char

ge

(m/s

)

Time (day)

D3

(a) (b)

Khairmal

0

4000

8000

12000

16000

Dis

char

ge

(m/s

)Time (days)

Original

−4000

0

4000

8000

Dis

char

ge

(m/s

)

Time (days)

A3

−2000

0

2000

Dis

char

ge

(m/s

)

Time (days)

D1

−8000

−4000

0

4000

8000

Dis

char

ge

(m/s

)

Time (days)

D2

−8000

−4000

0

4000

8000

Dis

char

ge

(m/s

)

Time (days)

D3

Hirakud release data Figure 3 9999 The discrete wavelet components of the discharge series of (a) Khairmal and (b) Hirakud release data for the year 2006.

510 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

values show the general behavior of the observed discharge.

For longer lead times, model predictions gradually deterio-

rate. This is because, for longer lead time, the river flow values

contain less information than for the shorter lead times for

modeling discharge values at the longer time horizon. The

results of the WBANN model in terms of different perfor-

mance indices for the testing period is shown in Table 4. The

RMSE statistic is a measure of residual variance and is

indicative of the model’s ability to predict high flows. Con-

sidering a high value of 33 979.2 m3/s and a very low value of

696.7 m3/s at the Naraj gauging station, the WBANN model

with RMSE value of 4981.3 m3/s performed satisfactorily up

to 5-day lead forecasts.

Table 5 shows the performance of ANN, WANN and

BANN models in terms of E, RMSE and MAE. It is clear from

the Tables 4 and 5 that for 1–5-day lead time forecasts the

WBANN and WANN models perform better than the tradi-

tional ANN and BANN. The time of concentration of the

basin is about 3 day. Therefore, the performance of the model

is not very good for longer lead time forecasts (i.e. 4- and

5-day lead). Performance of the WANN model for 4-day lead

time and performance of the WBANN model for 4- and 5-day

lead times are still significantly better than ANN and BANN,

as these models simulate the antecedent information for lead

time equal to the time of concentration by utilizing the

capability of wavelet and bootstrap techniques. Figures 8–10

show the hydrographs and scatter plots of observed and

predicted discharges using ANN, WANN, BANN and

WBANN for 1-, 3-, and 5-day lead time forecasts for the

testing period. It is obvious from the figures that for longer

lead time (41 day) the traditional ANN and BANN models

underestimate the observed data, whereas the WANN and

WBANN models are consistent and do not significantly

underestimate or overestimate the observed values. It is

observed from the figures that for 1-, 3- and 5-day lead time

forecasts the WANN and WBANN models forecast higher

discharge values better than the ANN and BANN models.

Moreover, performance of the WANN and WBANN models

in predicting peak values for 1- and 3-day lead time forecasts

is almost similar, even though the 5-day lead time forecast

Table 3 9999 Correlation coefficients between the discrete wavelet components and the observed discharge data at Naraj

Discrete wavelet components

Gauge stations

Khairmal Hirakud Release Salebhata Kesinga Kantamal Tikarpara Naraj

A3 0.91 0.87 0.75 0.60 0.66 0.92 0.93

D1 �0.04 �0.04 �0.01 �0.04 �0.02 0.01 0.14

D2 0.06 0.03 0.00 �0.01 0.00 0.13 0.19

D3 0.21 0.13 0.07 0.09 0.11 0.23 0.27

Original 0.88 0.81 0.52 0.39 0.50 0.93 1.00

Daily discharge time series

Significant input

WANN development

DWT

Bootstrapresampling

Bootstrapresampling

WBANNdevelopment

BANN development

ANNdevelopment

DWCs

Figure 4 9999 Flowchart showing the development of ANN, BANN, WANN and WBANN models.

511 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

performance of the WANN model is significantly better

compared to the WBANN model. The reason may be that

the WBANN model is the ensemble of several (in this study

200) ANNs built using resamples of DWCs. There may be a

chance that the high discharge values, which are very less, are

not included in any of the 200 bootstrap resamples for 5-day

lead time forecasts. The WANN model developed with newly

constructed time series using DWT is found to be better

compared to the simple ANN and BANN models for daily

discharge forecasting. This shows the capabilities of DWT in

0

10000

20000

30000

40000

Dis

char

ge

(m/s

)

Time (day)

ObservedWBANN

(a)

0

10000

20000

30000

40000

400003000020000100000

Pre

dic

ted

dis

char

ge

(m/s

)

Observed discharge (m /s)

WBANN

1:1 Line

(b)

Figure 5 9999 Hydrograph (a) and scatter plot (b) of observed and predicted discharge of

testing dataset for Naraj using the WBANN model for 1-day lead time.

0

10000

20000

30000

40000

Dis

char

ge

(m/s

)

Time (day)

ObservedWBANN

(a)

(b)

0

10000

20000

30000

40000

0 40000300002000010000

Pre

dic

ted

dis

char

ge

(m/s

)

Observed discharge (m /s)

WBANN

1:1 Line

Figure 6 9999 Hydrograph (a) and scatter plot (b) of observed and predicted discharge of

testing dataset for Naraj using the WBANN model for 3-day lead time.

0

10000

20000

30000

40000

Dis

char

ge

(m/s

)

Time (day)

ObservedWBANN

(a)

0

10000

20000

30000

40000

0 40000300002000010000

Pre

dic

ted

dis

char

ge

(m/s

)

Observed discharge (m /s)

WBANN

1:1 Line

(b)

Figure 7 9999 Hydrograph (a) and scatter plot (b) of observed and predicted discharge of

testing dataset for Naraj using the WBANN model for 5-day lead time.

Table 4 9999 Performance indices for 1–5-day lead time forecasts for the WBANN model

WBANN

Lead time (d) E (%) RMSE (m3/s) MAE (m3/s)

1 93.5 2247.9 1628.5

2 89.0 2921.6 2312.4

3 85.4 3367.2 2569.8

4 69.8 4850.9 3747.5

5 68.2 4981.3 3832.1

512 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

daily discharge forecasting. WANN forecasts are found to be

more accurate due to the fact that the features (such as

periodicity) of the subseries are obvious (Ning & Yunping

1998). It is to be clarified that, even though WANN for all the

lead times and ANN for 3- and 5-day lead times perform

better than BANN, the BANN and WBANN models are

more reliable and consistent. This is because the ANN

model is uncertain and is not reproducible. Even a change

in the arrangement of training, cross-validation and testing

datasets lead to a different performance of ANN and WANN

models and it cannot be predicted. On the other hand, the

BANN and WBANN models are consistent and more reliable

even if the arrangement of datasets for training, cross-

validation and testing changes. This is the reason why

ensemble forecasting, by averaging all the 200 models, are

taken instead of selecting a best model. The BANN model is

the aggregation of several (200 in this study) ANN models,

each developed on resampled original time series. As the

BANN model is developed on different realizations of the

actual dataset using the bootstrap method, the ensemble pre-

dictions of these competing models lead to comprehensive,

unbiased and realistic predictions. For longer lead time, the

performance of the WBANN model is significantly better

than the ANN, BANN and WANN models. Overall perfor-

mance of the WBANN model is found to be superior due to

the reason that it is developed by generating 200 bootstrap

resamples of DWCs instead of a raw dataset as for BANN. In

this way the WBANN model uses the capabilities of both

bootstrap and wavelet techniques by taking inputs which are

generated by resampling the newly constructed time series

using DWCs.

In order to assess the performance of models to predict

discharge values for different lead time forecasts for different

magnitude flows, a ‘‘partitioning analysis’’ (Jain & Srinivasulu

2004) is carried out by dividing the total discharge values into

low-, medium- and high- magnitude discharges. Table 6

presents the partitioning of discharge values for testing per-

iods based on the relative spread of the discharge from the

mean. It has been reported that the coefficient of efficiency

can be high (80 or 90%) even for poor models, and the best

models do not produce values which, on first examination,

are impressively higher (Legates & McCabe 1999;

Table 5 9999 Performance indices for 1–5-day lead time forecasts for ANN, WANN and BANN

models

Lead time (d) E (%) RMSE (m3/s) MAE (m3/s)

ANN

1 89.4 2873.8 1824.6

2 73.9 4510 2861.9

3 60.9 5517.9 3513.8

4 39.2 6883.4 4784.3

5 25.7 7607.3 5429.7

WANN

1 92.5 2421.3 1678.8

2 81.7 3771 2604.4

3 72.1 4665.4 3207.1

4 66.8 5082.5 3705.5

5 39.8 6847 5293.8

BANN

1 91.2 2619.8 1714.4

2 72.3 4647.4 2943.2

3 57.5 5753.5 3515.8

4 38.0 6951.6 4662.7

5 22.1 7791.9 5481.6

0

10000

20000

30000

40000

Dis

char

ge

(m/s

)

Time (day)

ObservedANNWANNBANNWBANN

(a)

0

10000

20000

30000

40000

0 40000300002000010000

Pre

dic

ted

dis

char

ge

(m/s

)

Observed discharge (m /s)

ANN

WANN

BANN

WBANN

1:1 Line

(b)

Figure 8 9999 Hydrograph (a) and scatter plot (b) of observed and predicted discharge of

testing dataset for Naraj using the ANN, WANN, BANN and WBANN models for

1-day lead time.

513 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

Krause et al. 2005). The RMSE statistic indicates only the

model’s ability to predict a value away from the mean (Hsu

et al. 1995). Therefore, it is important to test the model using

some other performance evaluation criteria such as threshold

statistics (Nayak et al. 2005 b). The threshold statistics (TS)

not only give the performance index in terms of predicting

discharges but also the distribution of the prediction errors.

The threshold statistic for a level of x% is a measure of

consistency in forecasting errors from a particular model. The

threshold statistics is represented as TSx and expressed as

a percentage. This criterion can be expressed for different

levels of absolute relative error (ARE) from the model. ARE is

given as

AREi ¼Oi � Pi

Oi

��������� 100 ð17Þ

TSx is computed for x% level as

TSx ¼Yx

n� 100 ð18Þ

where Yx is the number of computed discharges (out of n total

computed) for which the absolute relative error is less than

x% from the model.

Table 7 shows the distribution of forecast error predicted

from different models across different error thresholds for

1–5-day lead time forecasts for low, medium and high discharge

profiles. Even though there are only two high discharge

values, which constitute only 2.8% of the total testing dataset

(Table 6), it is observed that the performance of the WBANN

and WANN models for high flow forecasts is better than for

ANN and BANN. It may be due to the reason that wavelets

are reducing the noise in the discharge time series, making it

easier to predict the discharge whereas bootstrapping is

reducing the variance. Noise in high discharge may be due

0

10000

20000

30000

40000

Dis

char

ge

(m/s

)

Time (day)

ObservedANNWANNBANNWBANN

(a)

0

10000

20000

30000

40000

0 40000300002000010000

Pre

dic

ted

dis

char

ge

(m/s

)

Observed discharge (m /s)

ANN

WANN

BANN

WBANN

1:1 Line

(b)

Figure 9 9999 Hydrograph (a) and scatter plot (b) of observed and predicted discharge of

testing dataset for Naraj using the ANN, WANN, BANN and WBANN models for

3-day lead time.

–10000

0

10000

20000

30000

40000

Dis

char

ge

(m/s

)

Time (day)

ObservedANNWANNBANNWBANN

(a)

0

10000

20000

30000

40000

0 40000300002000010000

Pre

dic

ted

dis

char

ge

(m/s

)

Observed discharge (m /s)

ANN

WANN

BANN

WBANN

1:1 Line

(b)

Figure 10 9999 Hydrograph (a) and scatter plot (b) of observed and predicted discharge of

testing dataset for Naraj using the ANN, WANN, BANN and WBANN models for

5-day lead time.

Table 6 9999 Number of data points in low-, medium- and high-discharge categories

Category Number of pointsPercentage of thetotal data

Low (xom) 38 53.5

Medium (mrxrmþ 2s) 31 43.7

High (x4mþ 2s) 2 2.8

Total 71 100

m is the mean and s is the standard deviation.

514 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

to the reason that the discharge values are computed using

the rating curves; however, since rating curves are developed

with limited stage-discharge measurements and since mea-

surements of high flows are rare, there could be significant

errors in rating curves at high levels (Sahoo & Ray 2006). This

error or noise in high discharge is well approximated using

discrete wavelet decomposition. Performance of the

WBANN model for medium flow forecasts is significantly

better compared to the remaining three models as the

WBANN model can forecast 93.5%, 80.6%, 74.2%, 67.5%

and 64.5% of discharge values within 25% relative error for

1–5-day lead time forecast, respectively, whereas the WANN

model can forecast 77.4%, 64.5%, 71.0%, 58.1% and 35.5% of

discharge values within 25% relative error. All four models

perform more or less similarly during low discharge. It is

observed that medium discharges are simulated more satis-

factorily compared to low and high discharges profiles. The

reason may be that the low discharge data are influenced by

the unsaturated condition of the basin and the high discharge

values do not have proper representation in training the

models, whereas medium discharge values are not influenced

much by the unsaturated condition of the basin and have

proper representation in training datasets.

In order to compare ANNs with an empirical model,

we also developed a multiple linear regression (MLR)

model for daily discharge forecasting. The discharge at

Table 7 9999 Threshold statistics for all four models for testing period

Low flow Medium flow High flow

TS(%)

1-dlead

2-dlead

3-dlead

4-dlead

5-dlead

1-dlead

2-dlead

3-dlead

4-dlead

5-dlead

1-dlead

2-dlead

3-dlead

4-dlead

5-dlead

WBANN

5 18.4 5.3 0.0 2.6 5.3 29.0 19.4 19.4 12.9 12.9 0.0 0.0 0.0 0.0 0.0

10 26.3 7.9 0.0 2.6 10.5 51.6 38.7 35.5 29.0 35.5 100.0 50.0 0.0 0.0 0.0

20 50.0 28.4 21.1 13.2 13.2 77.4 71.0 61.3 41.9 54.8 100.0 100.0 50.0 50.0 50.0

25 65.8 41.1 26.3 22.4 18.4 93.5 80.6 74.2 67.5 64.5 100.0 100.0 100.0 50.0 50.0

50 86.8 71.1 68.4 47.4 39.5 100.0 100.0 100.0 93.5 90.3 100.0 100.0 100.0 100.0 100.0

BANN

5 10.5 13.2 5.3 7.9 2.6 22.6 16.1 6.5 6.5 3.2 0.0 0.0 0.0 0.0 0.0

10 21.1 13.2 26.3 21.1 18.4 61.3 29.0 29.0 6.5 6.5 0.0 0.0 0.0 0.0 0.0

20 55.3 34.2 39.5 34.2 31.6 80.6 54.8 51.6 19.4 16.1 50.0 50.0 50.0 0.0 0.0

25 63.2 54.7 44.7 41.8 39.5 93.5 67.7 58.1 32.3 19.4 50.0 50.0 50.0 0.0 0.0

50 92.1 78.9 68.4 52.6 55.3 100.0 93.5 90.3 77.4 54.8 100.0 100.0 50.0 0.0 0.0

WANN

5 7.9 5.3 2.6 10.5 0.0 35.5 16.1 16.1 9.7 12.9 50.0 0.0 0.0 0.0 100.0

10 28.9 15.8 7.9 15.8 5.3 58.1 41.9 35.5 25.8 12.9 100.0 100.0 0.0 0.0 100.0

20 34.2 34.2 23.7 31.6 15.8 71.0 61.3 58.1 48.4 25.8 100.0 100.0 50.0 100.0 100.0

25 52.6 44.7 31.6 42.1 23.7 77.4 64.5 71.0 58.1 35.5 100.0 100.0 100.0 100.0 100.0

50 86.8 68.4 47.4 63.2 34.2 100.0 90.3 90.3 83.9 90.3 100.0 100.0 100.0 100.0 100.0

ANN

5 18.4 7.9 7.9 10.5 5.3 25.8 16.1 9.7 6.5 9.7 50.0 0.0 0.0 0.0 0.0

10 23.7 23.7 18.4 13.2 13.2 58.1 38.7 22.6 12.9 12.9 50.0 0.0 0.0 0.0 0.0

20 63.2 44.7 36.8 26.3 23.7 77.4 58.1 45.2 25.8 22.6 50.0 50.0 50.0 0.0 0.0

25 71.1 47.4 57.9 31.6 26.3 83.9 64.5 54.8 35.5 29.0 50.0 50.0 50.0 0.0 0.0

50 90.7 74.9 78.9 57.9 55.3 100.0 93.5 93.5 87.1 61.3 100.0 100.0 50.0 0.0 0.0

515 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

Naraj station is selected as the dependent variable and the

input variables as of ANNs are selected as independent

variables. The best-fit model is estimated and selected

based on the highest coefficient of determination (R2). All

of the MLR models are first trained (to determine the

regression coefficients) using the data in the training set

(2000–2005) and then tested using the testing dataset

(2006). The SPSS software package (version 10, SPSS

Inc., Chicago, IL) is used for regression calculations. The

performance comparison of the ANN, WANN, BANN,

WBANN and MLR models is carried out with a simple

naive model in terms of persistence index, PERS. It is

obvious from Table 8 that the MLR model is better com-

pared to a simple naive model only for 1-day lead time

forecasts. All four ANN models (i.e. ANN, WANN, BANN

and WBANN) perform better compared to the MLR model

for 1–5-day lead time forecasts. It is clear that the PERS of

the WANN and WBANN models are greater than 0 and

therefore both the models are better compared to a simple

naive model for 1–5-day lead time forecasts. Performance

of ANN and BANN are better up to 4-day lead forecasts,

and negative PERS for 5-day lead time forecast indicate

that it is better to implement a simple naive model instead

of ANN and BANN. The performance of the WBANN

model is better than all the remaining models for all the

lead times forecast; these findings again show the strength

and capability of WBANN modeling for daily discharge

forecasting. Negative values of PERS for different models

show that for these lead times the models are not able to

extract any linear or nonlinear relationship and it is better

to implement a simple naive model. It is clear from these

findings that WBANN model performance is very good up

to 5-day lead time forecasts.

CONCLUSIONS

The aim of this paper is to introduce a conjunction model

WBANN based on wavelet and bootstrap techniques for

daily discharge forecasting. The decomposed time series of

the observed data present different periodic components.

Each of the wavelet components makes a distinct contribu-

tion to the original time series. The significant wavelet

components are selected based on the correlation between

the particular component series and the observed discharge

at Naraj and a new series is composed by adding

the significant components for each time series. These

series are employed as inputs to constitute the WANN

model for daily discharge forecasting. Use of wavelet com-

ponents supports the recent studies about the physics

involved in the ANNs (Jain et al. 2004; Sudheer & Jain

2004; Sudheer 2005). The significant inputs to the ANN

model are selected using the cross-correlation statistics.

The bootstrap resampling method is used to generate

different realizations of the datasets to create a set of

bootstrap samples that provide a better understanding of

the average and variability of the original unknown dis-

tribution or process. Two hundred bootstrapped ANN

models are generated to reduce the uncertainty involved

in ANN predictions. The inclusion of the DWCs in the

ANN input layer enabled consideration of the effective

periodic components in the discharge forecasting whereas

bootstrapping techniques are used to reduce the uncer-

tainty inherent in the conventional ANN modeling.

WBANN models are developed to incorporate the indivi-

dual capabilities of wavelet and bootstrap techniques. The

WBANN model is found to be superior to the traditional

ANN, BANN and WANN in terms of the selected perfor-

mance criteria for 1–5-day lead forecasts. The peak dis-

charge forecasts obtained by the WANN model are

noticeably closer to the observations compared to all

three remaining models. ARE and TS statistics show that

performance of the WBANN and WANN models for

high flow forecasts are better than ANN and BANN, the

performance of the WBANN model for medium flow fore-

casts is significantly better compared to the remaining three

models, whereas all four models perform more or less

similar during low discharge forecasts. This study shows

that the WBANN method is quite an appropriate tool for

Table 8 9999 Persistence index (PERS) for three different models for 1–5-day

lead time forecasts

Lead time (d) ANN WANN BANN WBANN MLR

1 0.40 0.58 0.50 0.63 0.3786

2 0.25 0.48 0.21 0.69 �0.0185

3 0.21 0.43 0.14 0.70 �0.2864

4 0.03 0.47 0.01 0.49 �0.4608

5 �0.02 0.17 �0.07 0.59 �0.3678

516 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

modeling the discharge for short as well as long lead time

forecasts.

REFERENCES

Abrahart, R. J. 2003 Neural network rainfall-runoff forecasting based oncontinuous resampling. J. Hydroinf. 5(1), 51–61.

Adamowski, J. F. 2008a River flow forecasting using wavelet and cross-wavelet transform models. Hydrol. Processes 22(25), 4877–4891.

Adamowski, J. F. 2008b Development of a short-term river floodforecasting method for snowmelt driven floods based on waveletand cross-wavelet analysis. J. Hydrol. 353(3–4), 247–266.

Aksoy, H., Guven, A., Aytek, A., Yuce, M. I. & Unal, N. E. 2007Discussion of generalized regression neural networks forevapotranspiration modelling by O Kisi (2006). Hydrol. Sci. J.52(4), 825–828.

Altunkaynak, A. 2007 Forecasting surface water level fluctuations ofLake Van by artificial neural networks. Wat. Res. Mngmnt. 21(2),399–408.

Anctil, F & Lauzon, N. 2004 Generalisation for neural networksthrough data sampling and training procedures, with applicationsto streamflow predictions. Hydrol. Earth Syst. Sci. 8(5), 940–958.

Anctil, F. & Tape, D. G. 2004 An exploration of artificial neuralnetwork rainfall-runoff forecasting combined with waveletdecomposition. J. Environ. Engng. Sci. 3(1), 121–128.

ASCE 1993 Criteria for evaluation of watershed models. J. Irrig. Drain. E119(3), 429–442.

ASCE Task Committee on Application of Artificial Neural Networks inHydrology 2000a Artificial neural networks in hydrology I:preliminary concepts. J. Hydrol. Engng. 5(2), 115–123.

ASCE Task Committee on Application of Artificial Neural Networks inHydrology 2000b Artificial neural networks in hydrology II:hydrologic applications. J. Hydrol. Engng. 5(2), 124–137.

Aussem, A., Campbell, J. & Murtagh, F. 1998 Wavelet-based featureextraction and decomposition strategies for financial forecasting.J. Comput. Intel. Finance 6(2), 5–12.

Baratti, R., Cannas, B., Fanni, A., Pintus, M., Sechi, G.M. & Toreno, N.2003 River flow forecast for reservoir management through neuralnetworks. NeuroComputing 55(3–4), 421–437.

Barreto, H. & Howland, F.M. 2006 Introductory Econometrics: UsingMonte Carlo Simulation with Microsoft Excel. CambridgeUniversity Press, Cambridge.

Bates, B.C. & Townley L.R. 1988 Nonlinear, discrete flood eventmodels. 3: Analysis of prediction uncertainty. J. Hydrol. 99(1�2),91–101.

Beven, K. 2006 A manifesto for the equifinality thesis. J. Hydrol.320(1�2), 18–36.

Bishop, C.M., 1995 Neural Networks for Pattern Recognition.Clarendon, Oxford.

Boucher, M.A., Perreault, L. & Anctil, F. 2009 Tools for the assessmentof hydrological ensemble forecasts obtained by neural networks.J. Hydroinf. 11(3–4), 297–307.

Campolo, M., Andreussi, P. & Soldati, A. 1999 River flood forecastingwith a neural network model. Wat. Res. Res. 35(4), 1191–1197.

Cannas, B., Fanni, A., See, L. & Sias, G. 2006 Data preprocessing forriver flow forecasting using neural networks: wavelet transformsand data partitioning. Phys. Chem. Earth 31(18), 1164–1171.

Cannon, A. J. & Whitfield, P. H. 2002 Downscaling recent streamflowconditions in British Columbia, Canada using ensemble neuralnetwork models. J. Hydrol. 259(1-4), 136–151.

Cigizoglu, H. K. 2003a Estimation, forecasting and extrapolation offlow data by artificial neural networks. Hydrol. Sci. J. 48(3),349–361.

Cigizoglu, H. K. 2003b Incorporation of ARMA models into flowforecasting by artificial neural networks. Environmetrics 14(4),417–427.

Coulibaly, P. & Evora, N. D. 2007 Comparison of neural networkmethods for infilling missing daily weather records. J. Hydrol.341(1�2), 27–41.

Daubechies, I. 1990 The wavelet transform, time-frequency localizationand signal analysis. IEEE Trans. Inf. Theory 36(5), 6–7.

Dawson, C. W., Abrahart, R. J. & See, L. M. 2007 HydroTest: aweb-based toolbox of evaluation metrics for the standardisedassessment of hydrological forecasts. Environ. Modell. Software22(7), 1034–1052.

Demirel, M. C., Venancio, A. & Kahya, E. 2009 Flow forecast bySWAT model and ANN in Pracana basin, Portugal. J. Hydrol.40(7), 467–473.

Dybowski, R. & Roberts, S. J. 2001 Confidence intervals and predictionintervals for feed forward neural networks. In: ClinicalApplications of Artificial Neural Networks (Dybowski, R. & Gant,V.). Cambridge University Press, Cambridge, pp. 298–326.

Efron, B. 1979 Bootstrap methods: another look at the jackknife. Ann.Statist. 7(1), 1–26.

Efron, B. & Tibshirani, R. J. 1993 An Introduction to the Bootstrap.Chapman and Hall, London.

Han, D., Kwong, T. & Li, S. 2007a Uncertainties in real-timeflood forecasting with neural networks. Hydrol. Processes 21(2),223–228.

Haykin, S. 1999 Neural Networks: A Comprehensive Foundation 2ndedn. Prentice Hall, Englewood Cliffs, NJ.

Hsu, K., Gupta, H. V. & Sorooshian, S. 1995 Artificial neural networkmodeling of the rainfall-runoff process. Wat. Res. Res. 31(10),2517–2530.

Jacquin, A. P. & Shamseldin, A. Y. 2009 Review of the application offuzzy inference systems in river flow forecasting. J. Hydroinf.11(3–4), 202–210.

Jain, A. & Srinivasulu, S. 2004 Development of effective and efficientrainfall–runoff models using integration of deterministic, real-coded genetic algorithms and artificial neural network techniques.Wat. Res. Res. 40, W04302.

Jain, A., Sudheer, K. P. & Srinivasulu, S. 2004 Identification of physicalprocesses inherent in artificial neural network rainfall runoffmodels. Hydrol. Processes 18(3), 571–581.

Jain, S. K., Das, D. & Srivastava, D. K. 1999 Application of ANN forreservoir inflow prediction and operation. J. Wat. Res. Plann.Mngmnt. 125(5), 263–271.

Jeong, D. & Kim, Y. O. 2005 Rainfall-runoff models using artificialneural networks for ensemble streamflow prediction. Hydrol.Processes 19(19), 3819–3835.

517 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

Jia, Y. & Culver, T. B. 2006 Bootstrapped artificial neural networks forsynthetic flow generation with a small data sample. J. Hydrol.331(3-4), 580–590.

Kim, T. W. & Valdes, J. B. 2003 Nonlinear model for droughtforecasting based on a conjunction of wavelet transforms andneural networks. J. Hydrol. Engng. 8(6), 319–328.

Kisi, O. 2008 Stream flow forecasting using neuro-wavelet technique.Hydrol. Processes 22(20), 4142–4152.

Kisi, O. 2009 Neural networks and wavelet conjunction model forintermittent streamflow forecasting. J. Hydrol. Engng. 14(8), 773–782.

Kitanidis, P. K. & Bras, R. L. 1980 Real-time forecasting with aconceptual hydrologic model: 2. Applications and results. Wat.Res. Res. 16(6), 1034–1044.

Koutsoyiannis, D. 2007 Discussion of ‘‘generalized regression neuralnetworks for evapotranspiration modelling’’ by O Kisi (2006).Hydrol. Sci. J. 52(4), 832–835.

Krause, P., Boyle, D. P. & Base, F. 2005 Comparison of differentefficiency criteria for hydrological model assessment. In: Proc. 8thWorkshop for Large Scale Hydrological Modelling, Oppurg 2004(Krause, P., Bongartz, K. & Flugel, W.-A. (Eds.)). Adv. Geosci. 5,89–97.

Kucuk, M. & Agiralloglu, N. 2006 Regression technique for stream flowprediction. J. Appl. Stat. 33(9), 943–960.

Lall, U. & Sharma, A. 1996 A nearest neighbor bootstrap for resamplinghydrologic time series. Wat. Res. Res. 32(3), 679–693.

Leclerca, M. & Ouarda, T. B. M. J. 2007 Non-stationary regional floodfrequency analysis at ungauged sites. J. Hydrol. 343(3-4), 254–265.

Legates, D. R. & McCabe, G. J. 1999 Evaluating the use the goodness-of-fit measure in hydrologic and hydroclimatic model validation.Wat. Res. Res. 35(1), 233–241.

Loague, L. & VanderKwaak, J. E. 2004 Physics-based hydrologicresponse simulation: platinum bridge, 1958 Edsel, or useful tool.Hydrol. Processes 18(15), 2949–2956.

Maier, H. R. & Dandy, G. C. 2000 Neural networks for the predictionand forecasting of water resources variables: a review of modellingissues and applications. Environ. Modell. Software 15(1),101–124.

Mallat, S. G. 1989 A theory for multi resolution signal decomposition:the wavelet representation. IEEE Trans. Pattern Anal. MachineIntel. 11(7), 674–693.

Mukerji, A., Chatterjee, C. & Raghuwanshi, N. S. 2009 Floodforecasting using ann, neuro-fuzzy, and neuro-GA models.J. Hydrol. Engng. 14(6), 647–652.

Muller, B. & Reinhardt, J. 1991 Neural Networks – An Introduction.Springer-Verlag, Berlin.

Nash, J. E. & Sutcliffe, J. V. 1970 River flow forcasting throughconceptual models. I. J. Hydrol. 10(3), 282–290.

Nason, G. P. & Von Sachs, R. 1999 Wavelets in time series analysis.Phil. Trans. R. Soc. Ser. A 357, 2511–2526.

Nayak, P. C., Sudheer, K. P. & Ramasastri, K. S. 2005a Fuzzy computingbased rainfall–runoff model for real time flood forecasting. Hydrol.Processes 19(4), 955–968.

Nayak, P. C., Sudheer, K. P., Rangan, D. M. & Ramasastri, K. S. 2004 Aneuro-fuzzy computing technique for modeling hydrological timeseries. J. Hydrol. 291(1–2), 52–66.

Nayak, P. C., Sudheer, K. P., Rangan, D. M. & Ramasastri, K. S. 2005bShort-term flood forecasting with a neurofuzzy model. Wat. Res.Res. 41(4), W04004.

NERC 1975 Flood Studies Report. Hydrological Studies vol. 1. NaturalEnvironment Research Council, London.

Ning, M., & Yunping, C. 1998 An ANN and wavelet transformationbased method for short term load forecast. Energy managementand power delivery. Int. Conf. 2, 405–410.

Nourani, V, Alami, M.T. & Aminfar, M.H. 2008 A combined neural-wavelet model for prediction of watershed precipitation,Ligvanchai, Iran. J. Environ. Hydrol. 16(2), 1–12.

Nourani, V., Komasi, M. & Mano, A. 2009 A multivariate ANN-waveletapproach for rainfall–runoff modeling. Wat. Res. Mngmnt. 23(14),2877-2894.

Partal, T. 2009 River flow forecasting using different artificial neuralnetwork algorithms and wavelet transform. Can. J. Civil Engng.36(1), 26–38.

Partal, T. & Cigizoglu, H. K. 2008 Estimation and forecasting of dailysuspended sediment data using wavelet–neural networks.J. Hydrol. 358(3-4), 317–331.

Partal, T. & Cigizoglu, H. K. 2009 Prediction of daily precipitation usingwavelet–neural networks. Hydrol. Sci. J. 54(2), 234–246.

Partal, T. & Kucuk, M. 2006 Long-term trend analysis using discretewavelet components of annual precipitations measurements inMarmara region (Turkey). Phys. Chem. Earth 31(18), 1189–1200.

Pramanik, N. & Panda, R. K. 2009 Application of neural network andadaptive neuro-fuzzy inference systems for river flow prediction.Hydrol. Sci. J. 54(2), 247–260.

Rao, A. R., Hamed, K. H. & Chen, H. L. 2003 Nonstationarities inHydrologic and Environmental Time Series. Kluwer, Dordrecht.

Rosso, O. A., Figliola, A., Blanco, S. & Jacovkis, P. M. 2004 Signalseparation with almost periodic components: a wavelets basedmethod. Revista Mexicana de Fisica 50(2), 179–186.

Sahoo, G. B. & Ray, C. 2006 Flow forecasting for a Hawaii stream usingrating curves and neural networks. J. Hydrol. 317(1), 63–80.

Sahoo, G. B., Schladow, S. G. & Reuter, J. E. 2009 Forecasting streamwater temperature using regression analysis, artificial neuralnetwork, and chaotic non-linear dynamic models. J. Hydrol.378(3-4), 325–342.

Satyajirao, Y. R. & Krishna, B. 2009 Modelling hydrological timeseries data using wavelet neural network analysis. IAHS Publ.333, 101–111.

Sharma, A., Tarboton, D. G. & Lall. U. 1997 Streamflow simulation: anonparametric approach. Wat. Res. Res. 33(3), 291–308.

Sharma, S. K. & Tiwari, K. N. 2009 Bootstrap based artificial neuralnetwork (BANN) analysis for hierarchical prediction of monthlyrunoff in Upper Damodar Valley Catchment. J. Hydrol. 374(3-4),209–222.

Solomatine, D. P. & Ostfeld, A. 2008 Data-driven modelling: some pastexperiences and new approaches. J. Hydroinf. 10(1), 3–22.

Srivastav, R. K., Sudheer, K. P. & Chaubey, I. 2007 A simplifiedapproach to quantifying predictive and parametric uncertainty inartificial neural network hydrologic models. Wat. Res. Res. 43(10),W10407.

Sudheer, K. P. 2005 Knowledge extraction from trained neural networkriver flow models. J. Hydrol. Engng. 10(4), 264–269.

518 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

Sudheer, K. P., Gosain, A.K. & Ramasastri, K.S. 2002 A data-drivenalgorithm for constructing artificial neural network rainfall-runoffmodels. Hydrol. Processes 16(6), 1325–1330.

Sudheer, K. P. & Jain, A. 2004 Explaining the internal behaviour ofartificial neural network river flow models. Hydrol. Processes18(4), 833–844.

Sudheer, K. P. & Jain, S.K. 2003 Radial basis function neural networkfor modeling rating curves. J. Hydrol. Engng. 8(3), 161–164.

Tasker, G. D. & Dunne, P. 1997 Bootstrap position analysis forforecasting low flow frequency. J. Wat. Res. Plann. Mngmnt.123(6), 359–367.

Tiwari, M. K. & Chatterjee, C. 2010 Uncertainty assessment andensemble flood forecasting using bootstrap based artificial neuralnetworks (BANNs). J. Hydrol. 382(1–4), 20–33.

Twomey, J. M. & Smith, A. E. 1998 Bias and variance of validationmethods for function approximation neural networks underconditions of sparse data. IEEE Trans. Syst. Man Cybernet. Part C:Appl. Rev. 28(3), 417–430.

Uhlenbrook, S., Seibert, J., Leibundgut, C. & Rodhe, A. 1999 Predictionuncertainty of conceptual rainfall-runoff models caused byproblems to identify model parameters and structure. Hydrol. Sci.J. 44(5), 779–798.

Wang, D. & Ding, J. 2003 Wavelet network model and its application tothe prediction of hydrology. Nat. Sci. 1(1), 67–71.

Wang, J. & Meng, J. 2007 Research on runoff variations based onwavelet analysis and wavelet neural network model: a case study

of the Heihe River drainage basin (1944-2005). J. Geog. Sci. 17(3),327–338.

Wang, W., Jin, J. & Li, Y. 2009 Prediction of inflow at Three Gorgesdam in Yangtze River with wavelet network model. Wat. Res.Mngmnt. 23(13), 2791–2803.

Wang, W., Vrijling, J. K., Van Gelder, P. H. A. J. M. & Ma, J. 2006Testing for nonlinearity of streamflow processes at differenttimescales. J. Hydrol. 322(1–4), 247–268.

Wu, C. L. & Chau, K. W. 2006 A flood forecasting neural networkmodel with genetic algorithm. Int. J. Environ. Pollut. 28(3–4),261–273.

Wu, C. L., Chau, K. W. & Li, Y. S. 2009 Methods to improve neuralnetwork performance in daily flows prediction. J. Hydrol. 372(1-4), 80–93.

Xingang, D., Ping, W. & Jifan, C. 2003 Multiscale characteristics of therainy season rainfall and interdecadal decaying of summermonsoon in North China. Chin. Sci. Bull. 48(24), 2730– 2734.

Yueqing, X., Shuangcheng, L. & Yunlong, C. 2004 Wavelet analysis ofrainfall variation in the Hebei Plain. Sci. China Ser. D. 48(12),2241–2250.

Zhang, B. L. & Dong, Z. Y. 2001 An adaptive neural-wavelet modelfor short term load forecasting. Elec. Power Syst. Res. 59(2),121–129.

Zhou, H. C., Peng, Y. & Liang, G.-H. 2008 The research of monthlydischarge predictor-corrector model based on waveletdecomposition. Wat. Res. Mngmnt. 22(1), 217–227.

First received 1 July 2009; accepted in revised form 1 February 2010. Available online 1 October 2010

519 M. K. Tiwari & C. Chatterjee 9999 A new wavelet–bootstrap–ANN hybrid model Journal of Hydroinformatics 9999 9999 201113.3

Copyright of Journal of Hydroinformatics is the property of IWA Publishing and its content may not be copied

or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission.

However, users may print, download, or email articles for individual use.