Water reservoir control under economic, social and environmental constraints

13
Automatica 44 (2008) 1595–1607 www.elsevier.com/locate/automatica Water reservoir control under economic, social and environmental constraints Andrea Castelletti * , Francesca Pianosi, Rodolfo Soncini-Sessa Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milan, Italy Received 31 March 2007; received in revised form 28 March 2008; accepted 31 March 2008 Available online 5 May 2008 Abstract Although great progress has been made in the last 40 years, efficient operation of water reservoir systems still remains a very active research area. The combination of multiple water uses, non-linearities in the model and in the objectives, strong uncertainties in inputs and high dimensional state make the problem challenging and intriguing. The purpose of this paper is to review, in a strict Control Theory perspective, recent and significant advances in designing management policies for water reservoir networks, under economic, social and environmental constraints. A general and thorough problem formulation is provided, along with a description of traditional solution techniques, their limitations and possible alternative approaches. c 2008 Elsevier Ltd. All rights reserved. Keywords: Stochastic control; Nonlinear control; Multiobjective optimisation; Multipurpose water reservoirs; Uncertain dynamic systems 1. Introduction Accounting for almost 20% of the World’s electrical output, hydropower is currently the World’s largest renewable source of electricity. Although it typically costs more per kW h than burning coal, oil or natural gas, hydropower is basically a non-polluting way of producing electricity: hydroplants do not emit any of the standard atmospheric pollutants, such as carbon dioxide or sulphur dioxide, that contribute to global warming and acid rain. It can therefore be a valuable contribution to meeting the Kyoto requirements on carbon emission reduction. 1 In addition, plant operating costs are usually low because there are no fuel costs, and maintenance requirements are minimal: hydropower is essentially inflation proof. This paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Editor Alain Haurie. * Corresponding author. Tel.: +39 2 23999601; fax: +39 2 23999611. E-mail addresses: [email protected] (A. Castelletti), [email protected] (F. Pianosi), [email protected] (R. Soncini-Sessa). 1 This assertion is questioned by a few recent studies, e.g. Fearnside (2004), on large tropical reservoirs. They suggested that decaying vegetation, submerged by flooding, may give off quantities of greenhouse gases equivalent to those from other sources of electricity, and that consequently hydropower would not, after all, be a panacea for climate change. This is still the focus of active debate (Rosa, Santos, Matvienko, Santos, & Sikar, 2004). However, hydropower is not without economic, social and environmental impacts. By significantly altering water levels and downstream water flow patterns, the normal operation of hydropower storage systems, namely reservoir networks, can have very negative effects on a range of economic interests (irrigated agriculture, fisheries, forestry, etc.), on the local human population (potable water supply, flooding, navigation and recreation activities) and on the flora and fauna that inhabit the surrounding areas and downstream water bodies. Typically, hydropower-irrigation conflicts may arise as a consequence of the difference in seasonal timing of power demand patterns and irrigation water needs, the former having their highest peak in winter and the latter having their greatest value in the growing summer season. Another usual effect of the spring-to-winter water volume reallocation operated by hydropower reservoirs is the increased risk of flooding on reservoir shores during the spring, snow-melt driven, floods. Superimposed on these seasonal conflicts are short-term, even hourly, fluctuations in downstream flows, in response to changing daily demands for hydropower (usually due to hydropeaking), which inevitably result in a conflict between hydropower and the downstream environment: the existing flow pattern of the river is disrupted, and along with this, all the habitats and species that depend on those patterns are endangered. 0005-1098/$ - see front matter c 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.automatica.2008.03.003

Transcript of Water reservoir control under economic, social and environmental constraints

Automatica 44 (2008) 1595–1607www.elsevier.com/locate/automatica

Water reservoir control under economic, social and environmentalconstraintsI

Andrea Castelletti∗, Francesca Pianosi, Rodolfo Soncini-Sessa

Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milan, Italy

Received 31 March 2007; received in revised form 28 March 2008; accepted 31 March 2008Available online 5 May 2008

Abstract

Although great progress has been made in the last 40 years, efficient operation of water reservoir systems still remains a very active researcharea. The combination of multiple water uses, non-linearities in the model and in the objectives, strong uncertainties in inputs and high dimensionalstate make the problem challenging and intriguing. The purpose of this paper is to review, in a strict Control Theory perspective, recent andsignificant advances in designing management policies for water reservoir networks, under economic, social and environmental constraints. Ageneral and thorough problem formulation is provided, along with a description of traditional solution techniques, their limitations and possiblealternative approaches.c© 2008 Elsevier Ltd. All rights reserved.

Keywords: Stochastic control; Nonlinear control; Multiobjective optimisation; Multipurpose water reservoirs; Uncertain dynamic systems

1. Introduction

Accounting for almost 20% of the World’s electrical output,hydropower is currently the World’s largest renewable sourceof electricity. Although it typically costs more per kW h thanburning coal, oil or natural gas, hydropower is basically anon-polluting way of producing electricity: hydroplants donot emit any of the standard atmospheric pollutants, suchas carbon dioxide or sulphur dioxide, that contribute toglobal warming and acid rain. It can therefore be a valuablecontribution to meeting the Kyoto requirements on carbonemission reduction.1 In addition, plant operating costs areusually low because there are no fuel costs, and maintenancerequirements are minimal: hydropower is essentially inflationproof.

I This paper was not presented at any IFAC meeting. This paper wasrecommended for publication in revised form by Editor Alain Haurie.

∗ Corresponding author. Tel.: +39 2 23999601; fax: +39 2 23999611.E-mail addresses: [email protected] (A. Castelletti),

[email protected] (F. Pianosi), [email protected] (R. Soncini-Sessa).1 This assertion is questioned by a few recent studies, e.g. Fearnside

(2004), on large tropical reservoirs. They suggested that decaying vegetation,submerged by flooding, may give off quantities of greenhouse gases equivalentto those from other sources of electricity, and that consequently hydropowerwould not, after all, be a panacea for climate change. This is still the focus ofactive debate (Rosa, Santos, Matvienko, Santos, & Sikar, 2004).

0005-1098/$ - see front matter c© 2008 Elsevier Ltd. All rights reserved.doi:10.1016/j.automatica.2008.03.003

However, hydropower is not without economic, social andenvironmental impacts. By significantly altering water levelsand downstream water flow patterns, the normal operation ofhydropower storage systems, namely reservoir networks, canhave very negative effects on a range of economic interests(irrigated agriculture, fisheries, forestry, etc.), on the localhuman population (potable water supply, flooding, navigationand recreation activities) and on the flora and fauna that inhabitthe surrounding areas and downstream water bodies. Typically,hydropower-irrigation conflicts may arise as a consequence ofthe difference in seasonal timing of power demand patterns andirrigation water needs, the former having their highest peak inwinter and the latter having their greatest value in the growingsummer season. Another usual effect of the spring-to-winterwater volume reallocation operated by hydropower reservoirsis the increased risk of flooding on reservoir shores duringthe spring, snow-melt driven, floods. Superimposed on theseseasonal conflicts are short-term, even hourly, fluctuations indownstream flows, in response to changing daily demands forhydropower (usually due to hydropeaking), which inevitablyresult in a conflict between hydropower and the downstreamenvironment: the existing flow pattern of the river is disrupted,and along with this, all the habitats and species that depend onthose patterns are endangered.

1596 A. Castelletti et al. / Automatica 44 (2008) 1595–1607

Fortunately, a change in water use priority away frompower generation exclusively towards a multi-purpose, multi-stakeholder and integrated perspective (see GWP-Global WaterPartnership (2000)) has been underway in many parts of theworld for several decades, driven by a growing consciousness ofpotentially disastrous effects of a global ‘water crisis’ (Brown,2001) and the increasing opposition worldwide to new largestorage projects (McCully, 2001). In many systems (see forinstance Kotchen, Moore, Lupi, and Rutherford (2006) andreferences therein), current regulation agreements are beingrelicensed with explicit inclusion of economic, social andenvironmental constraints on hydropower operations, e.g. byimposing regulation ranges and minimum environmental flowson downstream rivers.

The existence of multiple water uses, often unplanned inthe original design of the hydropower storage system, greatlycomplicates its current operation: the potential number ofoperational alternatives multiplies and existing conflicts andcompetitions make it more complicated to decide which oneto adopt. This partly explains why many systems worldwideare failing to produce the level of performance for which theywere designed (World Commission on Dams, 2000). Althoughonly 10% of the worlds hydropower potential is currently beingexploited (Khagram, 2004), the combination of few additionalsites, huge construction costs, reduction of financial supportfrom the World Bank and the above-mentioned changingwater use priorities sharply limit the potential for expansion.Attention must therefore focus on efficient and effectiveoperation of existing multipurpose reservoir networks, withthe aim of maximizing their performance with respect toall the water uses involved. This requires the adoption of arational approach that considers all the economic, social andenvironmental interests in a fully integrated manner and allowsfor systematically selecting more efficient (in the Pareto sense)operational alternatives with respect to these interests.

The problem of finding operational alternatives forefficiently managing water reservoir networks has fascinatedanalysts since the pioneering work of Rippl (1883). However,only towards the end of the 50s was it understood (Maaset al., 1962), under the influence of the budding researchareas of Control Theory and Operations Research, that therational approach to the problem is to formulate operationalalternatives in the form of feedback control policies. Whenthis new approach was introduced, the design of a policyappeared, from the system analysts’ viewpoint, a well-posedproblem, where the only difficulties were of a computationalnature, due to limitations in speed and memory size of thecomputers available. In the 70s, thanks to a rapid increase incomputer performance, more and more complex algorithmswere developed (among the others, see Heidari, Chow,Kokotovic, and Meredith (1971), Sniedovich (1979), Su andDeininger (1974), Tauxe, Inman, and Mades (1979), Yeh (1985)and references therein). These algorithms claimed to solvedifferent problems, but actually solved different formulationsof the same problem: a Stochastic Optimal Control Problem fora periodic system.

The problem still remains intellectually challenging despitebeing extensively studied in the last few decades, sinceits formulation and solution pose a number of intriguingdifficulties. Precisely:

(1) Multiple and conflicting interests (objectives). The key-stone hypothesis upon which the simple approach of the70s was based proves unfounded: policy design is not wellstructured and fully rational. As a consequence, the con-cept of optimality must be replaced by that of Pareto ef-ficiency: policies have to be designed through the formu-lation of a multi-objective (MO) problem, whose solutionis no longer a mere technical exercise but requires consid-eration of the preference structure of the parties involved.To separate technical issues from preference aspects, themost commonly adopted strategy is to reduce the multi-objective (MO) problem into a set of parametric single-objective (SO) optimal control problems. The solution, asthe parameter varies, provides the Pareto Efficient DecisionSet from which the Pareto Frontier is derived. Solution tech-niques from Control Theory are used to solve the SO prob-lems, while the choice of a point on the Frontier requiresthe adoption of Decision Making methods and NegotiationTheory, in the presence of multiple decision makers. In thispaper we will consider only aspects pertaining to ControlTheory, i.e. the formulation and solution of the SO controlproblem.

(2) The model of the system to be controlled is highly non-linear.

(3) Objective functions are usually non-linear and stronglyasymmetric.

(4) Strong uncertainties (e.g. of the inflow) affect the systemand cannot be neglected.

(5) The problem is usually formulated over an infinite horizon,since the lifetime of the water system under examinationcannot be arbitrarily limited.

(6) Different formulations of the problem are possible:aggregation of step-cost functions over time can beperformed by using either summation and/or otheroperators, such as the max; the criterion for filteringuncertainties depends on the risk aversion of the partiesinvolved; the infinite horizon can be managed in differentways, depending on the predominant characteristics of theobjectives (economic or physical) and on whether or nottransient periods have to be considered.

(7) The policy actuator is a human being, i.e. the policydoes not directly control the penstock sluice gates, butproposes a control decision to the regulator. Moreover,according to World Commission on Dams (2000), the‘non-prescriptive’ nature of the policy is the key to itsacceptance by the regulator. Therefore set-valued policies,providing a set of equivalent controls, are preferred to point-valued policies traditionally employed to control electro-mechanical systems.

The purpose of this paper is to review, in a strict ControlTheory perspective, the most recent advances in designingmanagement policies for water reservoir networks under

A. Castelletti et al. / Automatica 44 (2008) 1595–1607 1597

economic, social and environmental constraints. Emphasis isgiven to technical implications on the problem formulationand solution of the very nature of the water reservoir systems,with the aim of clarifying why many traditional controltechniques, widely and successfully applied in other fields,are here unapplicable or strongly limited. Mind that the termreservoir network is used here to refer to a physical network ofcatchments, reservoirs, power plants, and other users (includingenvironmental services) hydraulically interconnected. Clearly,as a hydropower generator, the reservoir network may beelectrically interconnected with other generators within anenergy market, but these interactions will not be discussed inthis paper. Many authors (see among the others Breton, Haurie,and Kalocsai (1978), Thompson, Davison, and Rasmussen(2004) and references therein) discuss this topic, however, theyusually consider hydropower production only, while here thetruly multi-objective nature of the problem is recognized.

The paper is organized as follows. Section 2 introducesmodels of water system components, formulates the policydesign problem as a MO problem and, finally, transforms itinto an SO problem. Section 3 shows how Stochastic DynamicProgramming (SDP) is the more natural algorithm for solvingthe design problem and why, on the contrary, the well knownand appealing LQG approach is totally unsuited. Section 4presents different approaches that have been proposed to copewith the computational complexity of SDP; attention is given tothe particular implication of each of them in the context hereconsidered. Finally, in the concluding section, remarks on thelimits of this review and on it neighboring areas are scrutinized.

2. Formulation of policy design problems

2.1. Model of the system

The system under study is composed of N reservoirs thatdrain water from M catchments. Reservoirs are connected witheach other and with water users like, for example, power plantsor irrigation districts, by a network of natural and artificialcanals. Even if the physical processes that are involved inthe system are obviously time-continuous, the model is time-discrete since the decision time-step is such. Here we willbriefly discuss models of reservoirs, catchments and water userswith the aim of highlighting those characteristics that influencethe formulation and solution of the control problem, while wewill give for granted models of canals, diversion dams andjunctions. For more detailed discussion on the model of thewater system, see Soncini-Sessa, Castelletti, and Weber (2007).

2.1.1. ReservoirsModel of the j-th water reservoir is based on the mass

balance equation

s jt+1 = s j

t + q jt+1 − r j

t+1 (1a)

where s jt is the storage in the j-th reservoir at time t , q j

t+1

is the inflow volume in the time interval [t, t + 1) and r jt+1

is the release in the same interval. Other terms like direct

precipitation on the reservoir, infiltration and evaporation havebeen neglected but they can be added to the mass balancewhen necessary. The time subscript of each variable denotes thetime instant at which it assume a deterministic value, e.g. lakestorage is measured at each time t and thus is denoted with st ,while inflow in the interval [t, t + 1) is denoted with qt+1 sinceit can be deterministically known only at the end of the interval.

Inflow q jt+1 is the outflow of a drainage network fed by the

releases r it+1 (i = 1, . . . , i 6= j) of the upstream reservoirs

(if any) and by the outflows akt+1 (k = 1, . . .) from natural

and uncontrolled catchments; the latter are described in thefollowing section.

Release r jt+1 is function of the control variable u j

t (whichis the release decision made at time t for reservoir j), of thestorage s j

t and of the inflow q jt+1

r jt+1 = R j

t (s jt , u j

t , q jt+1).

The function R jt (·) is called release function and it is a non-

linear periodic function of the following form

R jt (s j

t , u jt , q j

t+1) =

v

jt (s j

t , q jt+1) if u j

t < vjt (s j

t , q jt+1)

V jt (s j

t , q jt+1) if u j

t > V jt (s j

t , q jt+1)

u jt otherwise

(2)

where vjt (·) and V j

t (·) are the minimum and maximum releasesthat can be produced in the time interval [t, t + 1) by keepingall the sluice gates completely closed and completely openrespectively. These two functions are computed by integrating,over the interval [t, t + 1), the continuous-time mass balanceequation of the j-th reservoir

ds j

dζ= q j (ζ ) − r j (ζ ) (3)

where the instantaneous inflow q j (ζ ) is supposed to be constantand equal to q j

t+1/∆, ∆ being the length of the modelling step,and the instantaneous outflow r j (ζ ) is given by the minimumN min, j

(s j (ζ )

)or maximum N max, j

(s j (ζ )

)storage-discharge

relation of the reservoir sluice gates and spillway (Piccardi &Soncini-Sessa, 1991). Introduction of the release function thusallows for the inclusion of physical constraints into the model,which makes it possible for the actual release r j

t+1 to differ

from the release decision u jt , e.g. when the available water is

not sufficient to realize the decision or when a spill takes place.

2.1.2. Uncontrolled catchmentsPhysically-based models for the description of outflows

from natural, uncontrolled catchments are difficult to includein the formulation of the reservoir control problem: the stateof such models, in fact, is usually not observable and, moreimportantly, it can be very large (thousands of variableswhen the model is spatially-distributed), thus leading tocomputational problems. Therefore simple statistical modelsare generally adopted; for example, outflow ak

t+1 from the k-thuncontrolled catchment can be assumed to be a cyclostationary,

1598 A. Castelletti et al. / Automatica 44 (2008) 1595–1607

lognormal, stochastic process with periodic mean and standarddeviation µk

t and σ kt and its dynamics be described as

akt+1 = exp

(yk

t+1 · σ kt + µk

t

)(4a)

Ak(z−1)ykt+1 = εk

t+1 (4b)

where Ak is a polynomial in the backward shift operator z−1

and εkt+1 is a zero mean Gaussian white noise with constant

variance.

2.1.3. Water users and environmental, social and economicconstraints

The presence of various water users and other social andenvironmental interests can be formalized either by definingstep-cost functions associated with the system’s transitions orby imposing constraints on some variables. Given the varietyof issues that can be considered in reservoir control problems,it is not possible to make a general discussion of the topic; inthe following we will thus provide only some examples for themost common water users and environmental constraints.

A safeguard of the river and riparian ecosystem downstreamfrom the j-th reservoir can be guaranteed by introducing aminimum environmental flow (MEF) constraint on reservoirrelease, while conflict and competition among water userscan be mitigated by imposing a regulation range on reservoirstorage. These constraints are accounted for by suitablymodifying the minimum and maximum instantaneous storage-discharge relations N min, j (·) and N max, j (·) that are used in thecomputation of the minimum and maximum release volumesv

jt (·) and V j

t (·) (Section 2.1.1). For example, let q jt be the

MEF value for the time interval [t, t + 1) and (smin, jt , smax, j

t )

the regulation range for the same interval. Then, the minimuminstantaneous storage-discharge relation is modified as

N min, jt (s j (ζ ), q j

t+1)

=

min

{q j

t

∆,

q jt+1

}, if N min, j (s j (ζ )) ≤

q jt

∆otherwise{

N max(s j (ζ )) if s j (ζ ) > smax,tt

N min(s j (ζ )) otherwise.

(5)

Interests of the hydropower company owning the j-th plantare described by introducing a step-cost function that expresseslost revenue in the interval [t, t + 1)

l jt+1 = ϑ

jt (w

jt − G j

t+1)+ (6a)

where ϑj

t is the price of electricity, averaged over interval[t, t + 1), w

jt is the demand for electricity, G j

t+1 is electricityproduction, and the operator z+

= max(z, 0) is used. Demandfor electricity and its price can be either computed usingcomplex dynamic models (see for example Thompson et al.(2004)) or simply specified as a given scenario. ProductionG j

t+1 is computed as

G jt+1 = η j qd, j

t+1 Ht (6b)

where η j is a unit conversion and efficiency factor, qd, jt+1 is

the flow in the penstock and Ht is the hydraulic head. Whenthe reservoir itself is the pondage of the plant, H j

t dependson the level of the water surface in the reservoir and thus onits storage s j

t . The flow qd, jt+1 does not always coincide with

reservoir release due to the presence of a minimum qmin, j andmaximum qmax, j flow turbinable in the plant and/or a MEF q j

t

qd, jt+1 =

min

((r j

t+1 − q jt )+, qmax, j

),

if (r jt+1 − q j

t )+ ≥ qmin, j

0 otherwise.

(6c)

The interests of the l-th irrigation district may be describedby introducing a step-cost function that expresses the supplydeficit

dlt+1 = (wl

t − qd,lt+1)

+ (7)

where qd,lt+1 is the flow supplied to the irrigation district and

wlt is its water demand. The latter can be specified as a given

scenario or it can be the output of a dynamical model of thecrop’s growth (Wallach, Makowski, & Jones, 2006).

2.1.4. Global modelThe model of the water system is obtained by suitably

aggregating models of reservoirs, catchments, water users,canals, diversion dams and junctions that compose it. The resultis a discrete-time, periodic, non-linear, stochastic (or uncertain)system of the form

xt+1 = ft (xt , ut , εt+1) (8)

where xt ∈ Rnx , ut ∈ Rnu and εt ∈ Rnε are the state,control and disturbance vectors. The state is composed of statevariables of the N reservoirs, i.e. their storage, state variables ofthe M catchments, and, when applicable, the state of the canalsand water users

xt = [s1t , . . . , s N

t ; y1t , . . . , y1

t−p1; . . . ; yM

t , . . . , yMt−pM

, . . .]T(9)

where pk is the order of polynomial Ak(z−1) in Eq. (4b).The control vector is composed of N release decisions for Nreservoirs

ut = [u1t , . . . , uN

t ]T.

The disturbance vector is composed of M random disturbancesthat appear in models of uncontrolled catchments and anyother random variable that could be used to describe randomterms in the reservoir mass balance equation (e.g. evaporation,infiltration, etc.) or in the model of canals and water users. Forexample, if uncontrolled catchments are described with modelsof the form (4b) and no other disturbance affects the watersystem, the disturbance vector is given by

εt+1 = [ε1t+1, . . . , ε

Mt+1]

T.

Depending on how scalar disturbances have been modelled,the disturbance vector εt+1 is either uncertain or stochasticand is described in terms of a membership-set Ξt or a pdf

A. Castelletti et al. / Automatica 44 (2008) 1595–1607 1599

φt (·) respectively. At each time t , either Ξt and φt (·) may bea function of the state and control at the same time

εt+1 ∼ φt ( · |xt , ut ) or εt+1 ∈ Ξt (xt , ut ). (10)

2.2. The control problem

For each of the m issues present in the system (water demandfor hydropower production and/or irrigation, flood control,respect of environmental quality standards, etc.) an objectivefunction J i (with i = 1, . . . , m) can be defined to express thecost paid by the i-th sector over the time horizon [0, h],

J i= Ψ

ε1,...,εh

(gi

0(x0, u0, ε1), . . . ,

gih−1(xh−1, uh−1, εh), gi

h(xh))]

(11)

where git (·) for t = 1, . . . , h − 1 are step-cost functions

associated with transitions from t to t + 1, gih(·) is a penalty

function over the final state, Φ is an operator for aggregationover time and Ψ is a statistic used to filter the disturbance.Examples of step-cost function are Eqs. (6a) and (7). Commonchoices for aggregation over time are the sum (Φ = Σ ) andthe maximum (Φ = max). As for the filtering operator, theexpected value is often used (Ψ = E); however the maximum(Ψ = max) is preferred when stakeholders are risk averse(Orlovski, Rinaldi, & Soncini-Sessa, 1983, 1984; Soncini-Sessa, Zuleta, & Piccardi, 1991). Only two combinations ofthese operators are of interest for practical applications: Ψ = Eand Φ = Σ (so called Laplace problem) and both Ψ and Φequal to the maximum operator (Wald problem). Thus in thefollowing we will consider only these two cases.

At each time step, the release decision for each reservoir isgiven by the control law

ut = mt (xt ). (12)

The scope of the control problem is to define the sequence ofcontrol laws mt (·) over the horizon [0, h − 1], i.e. the releasepolicy

p = [m0(·), . . . , mh−1(·)]. (13)

Therefore the multi-objective (MO) control problem isformulated as

minp

[J 1, J 2, . . . , J m

](14)

subject to the constraints (8), (10), (12) and (13) and givenx0. The pdf formulation in Eq. (10) is used when the filteringcriterion Ψ in Eq. (11) is the expected value, the membership-set formulation is used when Ψ is the maximum. Note that thecontrol variable is unconstrained because unfeasible decisionsare not transformed into feasible ones due to the form of thereservoir’s model.

As anticipated in the introduction, an ‘optimal’ solutionto the control problem, i.e. a policy p∗ that minimizes allthe objectives, does not generally exist. In a multi-objectiveframework, the solution is constituted by the set P of Pareto

efficient policies (see for instance Miettinen (1999)). Eachefficient policy in P can be computed by solving the followingsingle objective (SO) optimal control problem

minp

J (15)

subject to the constraints (8), (10), (12) and (13) and givenx0, where J is derived from

[J 1, J 2, . . . , J m

]with a suitable

method (see for instance Lotov, Bushenkov, and Kamenev(2004)) and is of the form

J = Ψε1,...,εh

(g0(x0, u0, ε1), . . . ,

gh−1(xh−1, uh−1, εh), gh(xh))]

(16)

where gt (·) and gh(·) are the aggregate step-cost and penaltyfunctions obtained from gi

t (·) and gih(·) (with i = 1, . . . , m)

according to the aggregation method used to trace back theMO problem to a SO problem. The choice of the method isconstrained by the formulation of the problem that has beenadopted, and in particular by the choice of the filtering operatorΨ . Note that the number of efficient alternatives is generallyinfinite and thus only a finite subset of P can actually becomputed.

In the context of environmental system management, thechoice of the length of the time horizon and of the penaltyfunction gh(xh) is critical since the life time of the system isobviously infinite. Therefore it is more convenient to use aninfinite horizon and let gh(·) = 0. If the model of the system andall the step-cost functions are cyclostationary with period T , theproblem is well-posed and its solution is a periodic policy. TheSO problem over an infinite horizon is thus formulated as

minp

limh→∞

J (17)

subject to (8), (10) and (12), given x0 and

p =[m0(·), . . . , mT −1(·)

](18)

instead of (13). Note that if the aggregation over time consistsof summing the step-costs, i.e. Φ = Σ in Eq. (16), the objectivefunction must be adjusted in order to avoid divergence becauseit is not guaranteed that the controlled system will converge to astable cycle where all costs are zero. To overcome this difficulty,the objective function can be defined as the Total DiscountedCost (TDC)

J = limh→∞

Ψε1,...,εh

[h∑

t=0

γ t gt (xt , ut , εt+1)

](19)

with 0 < γ < 1, or as the Average Expected Value (AEV)

J = limh→∞

Ψε1,...,εh

[1

h + 1

h∑t=0

gt (xt , ut , εt+1)

]. (20)

The TDC form gives more weight to short-term, transientconditions and is well suited for expressing economic costs.The AEV, instead, gives emphasis to steady-state conditionsand is more suitable when social or environmental costs areconsidered.

1600 A. Castelletti et al. / Automatica 44 (2008) 1595–1607

3. Solving the control problem

3.1. Stochastic dynamic programming

Stochastic Dynamic Programming (SDP) (Bellman, 1957)appears to be the most suitable method for solvingproblem (15). The first application of (deterministic) dynamicprogramming to water systems management is probablyowed to Hall and Buras (1961). Since then, the methodhas been applied with success to the control of reservoirs,especially for hydropower production (see, among others,Esogbue (1989), Fults and Hancock (1972), Hall, Butcher,and Esogbue (1968), Heidari et al. (1971), Trott and Yeh(1973), Turgeon (1980)). Beginning in the early 1980s, interestalso spread in the stochastic version of dynamic programmingfor the control of multi-purpose reservoirs and networks ofreservoirs (see the reviews Yakowitz (1982), Yeh (1985) and thecontributions Gilbert and Shane (1982), Hooper, Georgakakos,and Lettenmaier (1991), Read (1989), Tejada-Guibert, Johnson,and Stedinger (1995), Vasiliadis and Karamouz (1994)). Notethat here, under the name of stochastic dynamic programming,we also consider the extension proposed by Piccardi (1993a,b)to the uncertain case.

One of the reason for the success of SDP lies in itswide applicability. In fact, the only conditions required foris application are: (1) inputs in the model be either controlsor random disturbances, i.e. it is not possible to consideruncontrolled, exogenous, deterministic variables whose valuesare known in real time (e.g. rainfall measures); (2) themembership-set or the pdf of the disturbance vector be of theform (10), i.e. that the disturbance process be independent intime or that, at time t , any dependency on the past could becompletely accounted for by the value of the state at the sametime; and (3) step-cost functions gt (·) only depend on variablesdefined for the same time interval.

The Bellman equation for the SO finite horizon optimalcontrol problem (15) is Bertsekas (1976), Piccardi (1993a)

Ht (xt ) = minut

Ψεt+1

[gt (xt , ut , εt+1), Ht+1(xt+1)

]](21)

where Ht (·) is the optimal cost-to-go for the aggregate objectiveand only the following combinations of operators Φ and Ψ areconsidered

Φ[v, w] = v + w and Ψ = E

Φ[v, w] = max{v, w} and Ψ = max .

The solution is computed by initializing Hh(xh) with gh(xh)

and recursively computing Ht (xt ) with Eq. (21). Once theoptimal cost-to-go have been computed for all time instantt = h−1, . . . , 0, the optimal control law at any time t is derivedas

mt (xt ) = arg minut

Ψεt+1

[gt (xt , ut , εt+1), Ht+1(xt+1)

]]. (22)

Thus it is a look-up table in which each state value xt isassociated with the optimal control value ut .

Practical computation of (21) requires that the sets Sxt ,Sut , and Sεt , of state, control and disturbance variables be

finite at each time t . If this is not the case, the sets Sxt , Sut ,and Sεt must be discretized and the model replaced by thecorresponding automaton. Uniform discretization is suitablewhen no information is available about the form of the optimalcost-to-go function Ht (·). Intuition is confirmed by somenumerical analysis results (Cervellera & Muselli, 2004), whichshow that the error in estimation of Ht (·), given the valuesthat it assumes in P points

(xi

t , Ht (xit+1)

)with xi

t ∈ Sxt , isproportional to an index, called the discrepancy index, whichexpresses the minimum density of the points xi

t among allsubsets of Sxt . For fixed P , uniform discretization has a lowdiscrepancy index and thus produces a low estimation error.However, when adopting a uniform grid, P = N nx

xt and thusthe number of points P can not be increased continuouslyand the distance between two successive values of P increasesexponentially with nx − 1. Methods have been developed(Fang & Wang, 1994; Niederreiter, 1992) to iteratively producenon-uniform discretizations whose discrepancy index decreasespolynomially with P (low-discrepancy sequences).

When an infinite horizon is considered, the idea is still torecursively solve Eq. (21); however the algorithm is startedat time t = 0 and with suitable initialization for H0(x0)

and it continues backwards in time until the optimal cost-to-go function converges to a periodic function of period T .Initialization can be arbitrary chosen when Ψ = E , while itmust be equal to

H0(x0) = infxt ∈Sxt ,ut ∈Sut ,εt+1∈Sεt+1

gt (xt , ut , εt+1)

when Ψ = max. If the TDC formulation (19) is used, theoperator Φ[·, ·] in the Bellman equation (21) must be definedas Φ[v, w] = v + γw, which guarantees that Ht (·) do notdiverge. If instead the AEV formulation (20) is used, it isnot possible to avoid divergence of Ht (·) if it is recursivelycomputed with Eq. (21). To overcome this difficulty the ideais to replace Ht (xt ) with the difference between Ht (xt ) and thecost-to-go Ht (xt ) of a reference state xt . Based on this idea, theSuccessive Approximation Algorithm (ASA) has been proposedfor either the stationary (White, 1963) and cyclostationary (Su& Deininger, 1972) case. Asymptotical convergence of boththe algorithms is guaranteed under suitable conditions (seeBertsekas (1976) for the stochastic case, Piccardi (1993a) forthe uncertain one) which are always satisfied by real worldwater systems.

The main limit of SDP is its computational complexity. LetNxt , Nut and Nεt be the number of elements in the discretizedstate, control and disturbance sets Sxt ⊂ Rnx , Sut ⊂ Rnu andSεt ⊂ Rnε : the recursive resolution of (21) for K iteration steps(with K = h if the optimization horizon is finite and K = kTif the horizon is infinite, where T is the period and k is usuallylower than ten) requires

K ·(N nx

xt· N nu

ut· N nε

εt

)(23)

evaluations of the operator Φ[·, ·] in (21). Eq. (23) show theso called curse of dimensionality, i.e. exponential growth ofcomputational complexity with the state and control dimension.

A. Castelletti et al. / Automatica 44 (2008) 1595–1607 1601

It follows that SDP cannot be applied to water systems where thenumber of reservoirs is greater than a few units.

3.2. Set-valued control policy

Note that Eq. (22) might have more than one solution. Ifthis is the case, the set Mt of all solutions of (22) can becomputed. This set contains all the equivalent optimal controlsand is a function of the state, thus it is a set-valued controllaw and the sequence P = [M0(·), . . . , Mh−1(·)] can beproved to be the optimal set-valued policy. Aufiero, Soncini-Sessa, and Weber (2001, 2002) prove that P is the ‘largest’set-valued policy that solves problem (15). Determining thegeneral set-valued policy requires almost the same computingtime as determining a point-valued policy and it can proveto be much more effective for a reservoir’s control problem.In fact, not only uniqueness of the solution is not necessary,since control is supposed to be implemented by a humanregulator, but it is not even favourable: leaving the regulatorthe possibility of choosing a control in Mt is preferable since inthis way (s)he can consider other information that are availablewhen the release decision is taken (e.g. down-time periods ofsome plant) but that have not been included in the model ofthe system when formulating the control problem. Adoptionof a set-valued policy approach turns out to be particularlyuseful also when some priority among the objectives can beestablished a priori (e.g. accordingly to national regulations). Inthis event, the optimal control problem (15) can be reformulateddecomposing it into a hierarchy of q (with q ≤ m) single or/andmulti-objective subproblems (lexicographic approach), each ofwhich is formulated considering as a feasible control set theoptimal set-value policies obtained by solving the problem ata higher level in the hierarchy. A numerical implementationof the lexicographic approach can be found in Weber, Rizzoli,Soncini-Sessa, and Castelletti (2002).

3.3. Linear Quadratic Gaussian control

If the system were linear and the cost function quadratic, thewell known results of Linear Quadratic Gaussian (LQG) controlcould be used to solve the optimal control problem. Someauthors (see for instance McLaughlin and Velasco (1990),Ozelkan, Galambosi, Fernandes, and Duckstein (1997), Wasimiand Kitanidis (1983)) have followed this approach. In orderto obtain a linear model, they simplify the reservoir’s massbalance equation by making the release coincide with therelease decision and express it in terms of deviations ofstorage, control/release and inflow from some pre-computednominal values. In order to obtain a quadratic cost function,for any t either they replace the step-cost function gt (·) withits first order Taylor expansion around the same nominalvalues (McLaughlin & Velasco, 1990) or they directly define itas a linear combination of the squared deviations of storage and

control/release (Ozelkan et al., 1997). As a nominal trajectoryof the inflow, they assume its cyclostationary mean, computedover past measures. As for storage and control/releasetrajectories, Ozelkan et al. (1997) use cyclostationary meanvalues observed in the past; McLaughlin and Velasco (1990)suggest to use the trajectories obtained by simulating the systemunder a simple operating rule and nominal inflow trajectory.

The advantages of this approach are the well knownadvantages of the LQG control scheme. In particular, it doesnot require discretization and does not suffer of the curse ofdimensionality. On the other hand, it requires introducing anumber of strong approximations and a priori assumptionswhich compromise optimality of the solution. First, thedeviation from the nominal inflow value is assumed to bea zero mean gaussian noise, either white or modelled asan autoregressive process. However, in most of the casesthis assumption is rather unrealistic since deviations abovethe mean value are much greater than deviations below it.Second, linearization of the reservoir model implies thatunintentional spills do not occur and that storage is unboundedboth superiorly and inferiorly. Thus, the policy obtained bysolving the LQG problem can suggest unfeasible controls; ifthe difficulty is overcome by adopting the nearest feasiblecontrol, the resulting policy is sub-optimal. Third, in most ofthe cases the formulation of the step-cost function gt (·) asa linear combination of the squared deviations of the stateand control/release from nominal values is a too rough anapproximation, since gt (·) is derived from step-costs gi

t (·) thatare in general strongly asymmetrical (see, for example, Eqs.(6) and (7)). Finally, the policy obtained with LQG approachaims at maintaining the system on the nominal trajectory but thelatter is assumed a priori and thus it is not the optimal trajectoryaccording to the problem objective, as defined by Eq. (16). Thepoint is that finding the trajectory to be followed is the veryscope of the problem.

4. Reducing computational complexity

Many approaches have been proposed to partially remedythe computational complexity of SDP, e.g. coarse gridapproximation, the use of Lagrange multipliers, approximationwith Legendre polynomials (Bellman & Dreyfus, 1962;Kaufmann & Cruon, 1967; Larson, 1968) and techniquesfor particular problem formulation (Luenberger, 1971; Wong& Luenberger, 1968). However these methods have beenconceived mainly for deterministic problems and thus are ofscarce interest for the optimal control of reservoirs networkswhere the impact of uncertain inputs, especially those due touncontrolled catchments, cannot be neglected.

In the following sections we will present other approachesthat have been proposed to overcome the curse of dimensional-ity. They can be classified based on the strategy that is adoptedfor reducing the problem complexity: reducing the degrees offreedom of the control problem (Section 4.1) or modifying themodel (Section 4.2).

1602 A. Castelletti et al. / Automatica 44 (2008) 1595–1607

4.1. Reducing degrees of freedom of the problem

In order to reduce the complexity of the problem by actingon its degrees of freedom, two approaches can be followed:fixing a priori the form of the optimal cost-to-go function, asdiscussed in Section 4.1.1, or directly fixing the form of thecontrol law, as discussed in Section 4.1.2.

4.1.1. Fixed-class optimal cost-to-goInstead of computing the exact value of Ht (·) for Nxt state

values, the idea is to evaluate it in a smaller number (Nxt < Nxt )of points and then interpolate such points with a function of afixed-class. Thereby Eq. (21) must be replaced by

Ht (xt ) = minut

Ψεt+1

Φ[gt (xt , ut , εt+1), Ht+1(xt+1)

](24)

where Ht+1(·) is an estimate of the optimal cost-to-goHt+1(·). This estimate is derived from the Nxt+1 evaluationsof Ht+1(·) made at previous step, by interpolating the points{(xi

t+1, Ht+1(xit+1)); i = 1, . . . , Nxt+1}, with a fixed-class

function. As for the choice of the latter, different classes havebeen proposed, e.g. linear polynomials (Bellman, Kabala, &Kotkin, 1963; Tsitsiklis & Van Roy, 1996), cubic Hermitepolynomials (Foufoula-Georgiou & Kitanidis, 1988) andsplines (Johnson, Stedinger, Shoemaker, Li, & Tejada-Guibert,1993). However, the most successful choice (Castelletti, deRigo, Rizzoli, Soncini-Sessa, & Weber, 2005, 2007) appearsto be that of neural networks, which leads to the so calledNeural Stochastic Dynamic Programming (NSDP) approach.NSDP can be used for either finite and infinite horizon exceptfor the AEV formulation, since in this case the convergenceof the solution algorithm is not guaranteed. As for the otherformulations, Bertsekas and Tsitsiklis (1996) proved that underbroad hypothesis it is guaranteed that the solution H·(·) liesin a bounded neighbourhood of the exact solution H·(·).A numerical implementation is described in Castelletti et al.(2007).

Finally note that computing time reduces because the termNxt in (23) reduces; however, exponential growth with thestate dimension nx is not avoided. This is why, with currentlyavailable computing power, NSDP can be used when nx isindicatively of the order of ten units at most Sharma, Jha, andNaresh (2004). However some recent experiments (Baglietto,Cervellera, Sanguineti, & Zoppoli, 2006; Cervellera, Chen,& Wen, 2006) have demonstrated that coupling NSDP andstate discretization with low-discrepancy sequences allows forsolving problems (on a finite or receding horizon) with evenhigher state dimension (30 state variables in Cervellera et al.(2006)).

4.1.2. Fixed-class policyAssume that, for any t , the control law belongs to a given

class function {m(·; θ t )} where θ t is a vector of unknownparameters. Then the optimal control problem (over a finitehorizon) can be formulated as

minθ0,...,θh−1

Ψε1,...,εh

[Φ(g0(x0, u0, ε1), . . . ,

gh−1(xh−1, uh−1, εh), gh(xh))]

subject to the constraints (8) and (10), x0 given and

ut = m(xt ; θ t ).

The same could be done for an infinite horizon cyclosta-tionary problem, were the unknown would be the sequence[θ0, . . . , θT −1].

The clear advantage of this approach is that the optimalcontrol problem is traced back to an optimization problemthat can be solved by means of classical MathematicalProgramming techniques (see among the others Guariso,Rinaldi, and Soncini-Sessa (1985), Orlovski et al. (1984),where complete numerical implementations are also presented)or more recent soft computing optimization approaches suchas genetic algorithms (see among the others Momtahenand Dariane (2007) and references therein) or ant colonyoptimization (Jalali, Afshar, & Marino, 2006). Its limit isthat results depend on the choice of the class of functions(e.g. linear, piecewise linear, fuzzy rule base, etc.) to whichthe control law belongs and, obviously, optimality can notbe guaranteed. Regulation practice often provides indicationsfor this choice: a review of fixed-class approaches based onempirical experience can be found in Oliveira and Loucks(1997). Alternatively, universal approximators (e.g. neuralnetworks) can be used; particularly promising is the approachrecently proposed by Baglietto et al. (2006), which usesfeedforward neural networks to approximate the control lawand a stochastic approximation algorithm to optimize thenetwork parameters (see also the extension proposed by Pianosiand Soncini-Sessa (2008)).

4.2. Modifying the water system model

A radical solution for overcoming the curse of dimension-ality is to modify the water system model and reduce the di-mensionality of its state. The first applications of this idea traceback to the work of Turgeon (1981) who proposed to modify thetopology of reservoir networks in order to reduce the numberof storage variables. The idea is to replace the n-reservoirs con-trol problem with n subproblems, each considering two reser-voirs: one of the actual reservoirs plus an equivalent reservoirthat accounts for all the downstream storages. With this ap-proach, the overall computing time for the solution of the prob-lem grows linearly with n. Other authors that have followed thisidea are Saad, Turgeon, Bigras, and Duquette (1994), who pro-pose the aggregation of the whole reservoir network in a singlestorage unit, and Archibald, McKinnon, and Thomas (1997),who suggests a decomposition technique where each subprob-lem includes an actual reservoir and two equivalent reservoirsfor upstream and downstream storages respectively. With thelatter technique, the computational complexity is reduced toa quadratic function of the state dimension. These approachesdemonstrated to be of value in practical applications but no op-timality property can be proved. Another approach that suit-ably exploits particular topological structures has been pro-posed by Delebecque and Quadrat (1978). They apply singular

A. Castelletti et al. / Automatica 44 (2008) 1595–1607 1603

perturbation and averaging techniques to a large hydropowerreservoir network composed of several valleys with a large sea-sonal reservoir at the top end and a number of smaller weeklyreservoirs downstream. The simplification is operated at the val-ley level, while the problem for the whole hydropower networkis formulated as a SO, multi-DMs problem and solved usingteam theory.

Another possibility is to reduce the state of the systemby eliminating the model (4) of uncontrolled catchments thusobtaining the so called reduced model of the water system.In Section 4.2.1 an on-line approach is presented in whichoutflows from uncontrolled catchments are considered amongthe system’s disturbances and their dynamics is accounted forby solving the problem on-line and updating the disturbancemembership-sets or pdfs with real-time information. Thisinformation is collected into a vector It , which includes notonly the current state of uncontrolled catchments but alsoany variable that is useful for predicting their future outflows,like, for example, precipitation or snow-cover measures.In Section 4.2.2 an off-line partial model-free approach ispresented that allows for completely eliminating the models ofthe uncontrolled catchments; in this case, the release policy iscomposed of control laws whose argument include also someof the components of the information vector It .

4.2.1. On-line approachThe idea is as follows: models of uncontrolled catchments

are eliminated and outflows akt+1, k = 1, . . . , M , are included

among disturbances of the reduced water system model. Thisis possible because these subsystems are not influenced bythe control ut . By doing so, the number of components inthe state vector (9) is reduced, since the components yk

t−ido not appear. At each time t , an on-line optimal controlproblem over a finite horizon [t, t + h] is formulated andsolved. For each time τ in the finite horizon [t, t + h], themembership-set Ξτ or pdf φτ (·) of the disturbance is providedby a dynamic predictor that uses all information It availableat time t . Once the on-line problem has been solved, only thecontrol for the first time step [t, t + 1) is actually applied and,at time t + 1, a new problem is formulated over the horizon[t + 1, t + 1 + h] with new membership-sets or pdfs forthe disturbances, based on It+1 (receding horizon principle).As previously anticipated, the information vector It obviouslycontains the state of catchments at time t and it may alsocontain uncontrolled exogenous variables like, for example,measures of precipitation, snow-cover, etc. that are significantfor the prediction of catchment outflow. In other words, on-lineupdating of outflow membership-sets or pdfs can be based on amodel more sophisticated than model (4). In most of the cases,in fact, the description of uncontrolled catchment provided bymodel (4) is a rough approximation but it cannot be improveddue to the need to limit the state dimension in the off-linesolution with SDP.

The on-line problem can be formulated as:

1. A deterministic open-loop control problem

minut ,...,ut+h−1

Φ(gt (xt , ut , εt+1), . . . , gt+h(xt+h)

)

subject to

xτ+1 = fτ (xτ , uτ , ετ+1), τ = t, . . . , t + h − 1

xt given

where xτ is the reduced state vector, f (·) is thecorresponding state transition function and, for each τ =

t, . . . , t + h − 1, ετ+1 is the expected or maximum value ofετ+1 based on φτ (·|It ) or Ξτ (It ).

2. A stochastic open-loop control problem

minut ,...,ut+h−1

Ψεt+1,...,εt+h

(gt (xt , ut , εt+1), . . . , gt+h(xt+h)

)]subject to

xτ+1 = fτ (xτ , uτ , ετ+1), (25a)

ετ+1 ∼ φτ ( · |It ) or ετ+1 ∈ Ξτ (It ), (25b)

τ = t, . . . , t + h − 1 (25c)

xt given. (25d)

3. A stochastic closed-loop control problem

minp

Ψεt+1,...,εt+h

(gt (xt , ut , εt+1), . . . , gt+h(xt+h)

)]subject to (25) and

uτ = mτ (xτ ), τ = t, . . . , t + h − 1

p = [mt (·), . . . , mt+h−1(·)].

Problem 1 is referred to by Bertsekas (1976) as Naive FeedbackControl NFC problem, problem 2 as Open-Loop FeedbackControl (OLFC) and problem 3 as Partial Open-Loop FeedbackControl (POLFC). Problem 1 and 2 can be solved by means ofMathematical Programming techniques, problem 3 is solved bymeans of SDP.

For all problems, one of the main difficulties is the choiceof the penalty function gh(·), which influences both theperformance of the closed loop scheme and its stability (Mayne,Rawlings, Rao, & Scokaert, 2000). One possibility (Nardini,Piccardi, & Soncini-Sessa, 1994) is to let gh(·) be equal to theoptimal cost-to-go Hh(·) obtained by solving an off-line infinitehorizon problem with the reduced model and a trivial predictor,i.e. with a priori pdf or membership-set for the descriptionof the disturbance. However, since a solution of the latterproblem requires using SDP, this approach can be followedonly if the reservoir network is composed of few reservoirs. Anapplication of the POLFC scheme to a real world case study canbe found in Castelletti, de Rigo, Soncini-Sessa, Tepsich, andWeber (2008).

As for optimality, it is well known from the certaintyequivalence principle that the solution of problem 1 coincideswith the optimal solution of the off-line closed-loop problemwith the complete model (8)–(10), i.e. of the originalproblem. Bertsekas (1976) proved that, independently of theform of the model, the solution of problem 3 cannot be worsethan the solution of the off-line open-loop problem with thecomplete model. As for the other problems, it is reasonablethat performances of their solution increase when passing fromproblem 1 to problem 3, but there exist cases when the solutionto problem 2 is better the that of problem 3.

1604 A. Castelletti et al. / Automatica 44 (2008) 1595–1607

Finally note that the Extended Linear Quadratic Gaussian(ELQG) approach proposed by Georgakakos (1989), Geor-gakakos and Marks (1987), which encountered wide diffusionand recognition in reservoir management practice, is an algo-rithm for the resolution of problem 2. Because of its name,it is sometimes incorrectly cited as a variation of the tradi-tional LQG approach introduced in the previous section. On thecontrary, it has been proposed and it is suited for constrained,non-linear models and for on-line resolution. The underlayingidea is to simulate the system subject to a control trajectoryu(i)

t , . . . , u(i)t+h−1, obtain the corresponding trajectory of the ex-

pected value of the state, linearize the model around such tra-jectory and apply the Newton method to obtain a new controltrajectory u(i+1)

t , . . . , u(i+1)t+h−1, until convergence has been

reached. The method allows for the introduction of reliabilityconstraints over the state and the control; the former are ac-counted for by increasing the value of the cost function whenthe constraints are violated, the latter by projecting the Newtondescent direction into the feasible control set.

4.2.2. Off-line partial model free approachThe only way to use the reduced model of the system also

in off-line policy design, without resorting to the unrealisticassumption that outflows from uncontrolled catchments arepurely random disturbances, is that of using a solution approachbased on Reinforcement Learning (see Barto and Sutton (1998),Kaelbling, Littman, and Moore (1996)). With this approach,the control law depends on the reduced state vector xt andon a reduced information vector It , constituted with thosecomponents of the information vector It that the Analystconsiders having a key role in the outflow formation process.Reinforcement Learning is based on the idea of designingthe policy through a trial-and-error learning process, in whichmodel-based estimates of the system transitions are substitutedwith direct observations of real system evolution (model-free). Precisely, alternative controls are experimented on-line, corresponding effects on the system outputs are directlyobserved and the Q-factor is updated (Q-learning by Watkinsand Dayan (1992)). The latter is somehow analogous to optimalcost-to-go and is associated to the quadruple (t, xt , ut , It ).Unfortunately, on-line experiments can not be performed onreal world reservoirs, as this may result in unacceptable socialcosts (all the controls, even those producing disastrous effects,have to be experimented as these can be only evaluated ex-post) and the learning process would take too much time. Toovercome this hurdle a mixed, partial model-free, approach hasbeen recently proposed (Castelletti, Corani, Rizzoli, Soncini-Sessa, & Weber, 2001). It combines the model-free approachof Q-learning with SDP-based off-line policy design. In thelearning process, registered time series of ( It , It+1, ak

t+1) areused as if they were produced on-line by nature, while otherparts of the water system are described with the reduced model.

5. Concluding remarks

Although the problem of designing efficient water reservoirmanagement policies has been extensively studied in the last

years in many disciplines, ranging from Hydrology throughDecision Theory to Electrical Engineering, it is still a veryintriguing research theme. This paper reviewed some of therecent, and in the authors’ opinion, more significant advancesin policy design by a Control Theory perspective. Focus wasmainly on the implications that the very nature of the storagesystems has on the formulation and solution of the controlproblem.

The problem proposed has many other facets that have notbeen dealt with in the paper, but are worthwhile mentioning.

When new water reservoir networks are being planned, thecontrol problem discussed in the paper has to be nestedin a mathematical programming problem whose argumentsare the planning variables (e.g. number and capacity ofreservoirs).In normal, real-time management of water reservoirnetworks a number of changes may occur in systemconditions (e.g. down-time periods for some hydropowerunits, irrigation canals under maintenance, etc.) that couldnot be accounted for in designing the off-line policy and thusrequire it to be modified. This argument is not dealt with inthe paper but on-line approaches discussed in Section 4.2.1are well suited for this purpose as they can be viewed asadaptive control schemes. Each time relevant changes occurin the system one can switch from the off-line policy toan on-line policy, computed as explained in that section,and then re-adopt the off-line policy once normal systemconditions are restored.In a multipurpose and multistakeholder context, the choiceof policy to adopt in the set of the efficient policies is thefinal step of a complex, often recursive, decision makingprocess that involve many different phases: from stakeholderanalysis, through system model identification and the verypolicy design, to comparison and negotiations of efficientpolicies. Activities within these phases involve skills fromSystems and Control Theory as well as Decision Making,Hydrology, Sociology, Alternative Dispute Resolution, andrequire full stakeholder involvement and integration amongthe different and disparate issues. They have to beorganized in a procedure (Castelletti & Soncini-Sessa, 2006)and supported by proper computer tools, namely, Multi-Objective Decision Support Systems (see among the othersLiu and Stewart (2004), Nandalal and Simonovic (2002),Salewicz and Nakayama (2004), Soncini-Sessa, Rizzoli,Villa, and Weber (1999)).

Acknowledgment

Partially supported by FONDAZIONE CARIPLO TWOLE-2004.

References

Archibald, T. W., McKinnon, K. I. M., & Thomas, L. C. (1997). An aggregatestochastic dynamic programming model of multireservoir systems. WaterResources Research, 33(2), 333–340.

Aufiero, A., Soncini-Sessa, R., & Weber, E. (2001). Set-valued control laws inminmax control problem. In Proceedings of IFAC workshop on modellingand control in environmental issues.

A. Castelletti et al. / Automatica 44 (2008) 1595–1607 1605

Aufiero, A., Soncini-Sessa, R., & Weber, E. (2002). Set-valued control laws inTEV-DC control problem. In Proceedings of 15th IFAC world congress onautomatic control.

Baglietto, M., Cervellera, C., Sanguineti, M., & Zoppoli, R. (2006). Waterreservoirs management under uncertainty by approximating networks andlearning from data. In Topics on system analysis and integrated waterresource management. Amsterdam: Elsevier.

Barto, A., & Sutton, R. (1998). Reinforcement learning: An introduction.Boston: MIT Press.

Bellman, R. E. (1957). Dynamic programming. Princeton: Princeton UniversityPress.

Bellman, R. E., & Dreyfus, S. (1962). Applied dynamic programming.Princeton: Princeton University Press.

Bellman, R. E., Kabala, R., & Kotkin, B. (1963). Polynomial approximation -a new computational technique in dynamic programming. Mathematics ofComputation, 17(8), 155–161.

Bertsekas, D. P. (1976). Dynamic programming and stochastic control. NewYork: Academic Press.

Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming.Boston: Athena Scientific.

Breton, A., Haurie, A., & Kalocsai, R. (1978). Efficient management ofinterconnected power systems: A game theoretic approach. Automatica, 14,443–452.

Brown, L. R. (2001). How water scarcity will shape the new century. WaterScience and Technology, 43(4), 17–22.

Castelletti, A., Corani, G., Rizzoli, A. E., Soncini-Sessa, R., & Weber, E.(2001). A reinforcement learning approach for the operational managementof a water system. In Proceedings of IFAC workshop modelling and controlin environmental issues. Yokohama: Elsevier.

Castelletti, A., de Rigo, D., Rizzoli, A. E., Soncini-Sessa, R., & Weber, E.(2005). An improved technique for neuro-dynamic programming appliedto the efficient and integrated water resources management. In 16th IFACworld congress.

Castelletti, A., de Rigo, D., Rizzoli, A. E., Soncini-Sessa, R., & Weber,E. (2007). Neuro-dynamic programming for designing water reservoirnetwork management policies. Control Engineering Practice, 15(8),1001–1011.

Castelletti, A., de Rigo, D., Soncini-Sessa, R., Tepsich, L., & Weber, E. (2008).On-line design of water reservoir policies based on inflow prediction. In17th IFAC world congress.

Castelletti, A., & Soncini-Sessa, R. (2006). A procedural approach tostrengthening integration and participation in water resource planning.Environmental Modelling & Software, 21(10), 1455–1470.

Cervellera, C., Chen, V. C. P., & Wen, A. (2006). Optimization of a large-scalewater reservoir network by stochastic dynamic programming with efficientstate space discretization. European Journal of Operational Research,171(3), 1139–1151.

Cervellera, C., & Muselli, M. (2004). Deterministic design for neural networklearning: An approach based on discrepancy. IEEE Transaction on NeuralNetworks, 15(3), 533–544.

Delebecque, F., & Quadrat, J. P. (1978). Contribution of stochastic controlsingular perturbation averaging and team theories to an example oflarge-scale systems: The management of hydropower production. IEEETransaction on Automatic Control, 23(2), 209–222.

Esogbue, A. O. (1989). Dynamic programming and water resources: Originsand interconnections. In Dynamic programming for optimal waterresources systems analysis. Englewood Cliffs: Prentice-Hall.

Fang, K. T., & Wang, Y. (1994). Number-theoretic methods in statistics.London: Chapman & Hall.

Fearnside, P. M. (2004). Greenhouse gas emissions from hydroelectric dams:Controversies provide a springboard for rethinking a supposedly ‘clean’energy source. Climatic Change, 66, 1–8.

Foufoula-Georgiou, E., & Kitanidis, P. K. (1988). Gradient dynamicprogramming for stochastic optimal control of multidimensional waterresources systems. Water Resources Research, 24, 1345–1359.

Fults, D. M., & Hancock, L. F. (1972). Optimal operations models for Shasta-Trinity system. Journal of the Hydraulic Division ASCE, 98, 1497–1514.

Georgakakos, A. P. (1989). Extended Linear Quadratic Gaussian (ELQG)control: Further extensions. Water Resources Research, 25(2), 191–201.

Georgakakos, A. P., & Marks, D. H. (1987). A new method for real-time operation of reservoir systems. Water Resources Research, 23(7),1376–1390.

Gilbert, K. C., & Shane, R. M. (1982). TVA hydroscheduling model:Theoretical aspects. Journal of Water Research Planning and Management— ASCE, 108(1), 21–36.

Guariso, G., Rinaldi, S., & Soncini-Sessa, R. (1985). A decision support systemfor water management: The Lake Como case study. European Journal ofOperational Research, 21, 295–306.

GWP-Global Water Partnership, (2000). Integrated water resources manage-ment. TAC Background paper 4, GWP Secretariat, Stokholm.

Hall, W. A., & Buras, N. (1961). The dynamic programming approach to waterresources development. Journal of Geophysical Research, 66(2), 510–520.

Hall, W. A., Butcher, W. S., & Esogbue, A. (1968). Optimization of theoperation of a multi-purpose reservoir by dynamic programming. WaterResources Research, 4(3), 471–477.

Heidari, M., Chow, V. T., Kokotovic, P. V., & Meredith, D. (1971). Discretedifferential dynamic programming approach to water resources systemsoptimisation. Water Resources Research, 7(2), 273–282.

Hooper, E. R., Georgakakos, A. P., & Lettenmaier, D. P. (1991). Optimalstochastic operation of Salt River Project, Arizona. Journal of WaterResearch Planning and Management — ASCE, 117(5), 556–587.

Jalali, M. R., Afshar, A., & Marino, M. A. (2006). Reservoir operation by antcolony optimization algorithms. Iranian Journal of Science & Technology,30, 107–117.

Johnson, S. A., Stedinger, J. R., Shoemaker, C., Li, Y., & Tejada-Guibert, J.A. (1993). Numerical solution of continuous-state dynamic programs usinglinear and spline interpolation. Operations Research, 41, 484–500.

Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcementlearning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.

Kaufmann, A., & Cruon, R. (1967). Dynamic programming. New York:Academic Press.

Khagram, S. (2004). Dams and development: Transnational struggles for waterand power. Ithaca: Cornell University Press.

Kotchen, M. J., Moore, M. R., Lupi, F., & Rutherford, E. S. (2006).Environmental constraints on hydropower: An ex post benefit-cost analysisof dam relicensing in Michigan. Land Economics, 82(3), 384–403.

Larson, R. E. (1968). State incremental dynamic programming. New York:American Elsevier.

Liu, D., & Stewart, T. J. (2004). Object-oriented decision supportsystem modelling for multicriteria decision making in natural resourcemanagement. Computers & Operations Research, 31, 985–999.

Lotov, A. V., Bushenkov, V. A., & Kamenev, G. K. (2004). Interactive decisionmaps approximation and visualization of pareto frontier. Heidelberg:Springer-Verlag.

Luenberger, D. G. (1971). Cyclic dynamic programming: A procedure forproblems with fixed delay. Operations Research, 19(4), 1101–1110.

Maas, A., Hufschmidt, M. M., Dorfam, R., Thomas, H. A., Marglin, S. A., &Fair, G. M. (1962). Design of water resource systems. Boston, MA: HarvardUniversity Press.

Mayne, D. Q., Rawlings, J. B., Rao, C. V., & Scokaert, P. O. M. (2000).Constrained model predictive control: Stability and optimality. Automatica,36, 789–814.

McCully, P. (2001). Silenced rivers. London: Zed Books.McLaughlin, D., & Velasco, H. L. (1990). Real-time control of a system of

large hydropower reservoirs. Water Resources Research, 26(4), 623–635.Miettinen, K. (1999). Nonlinear multiobjective optimization. Dordrecht:

Kluwer Academic Publishers.Momtahen, Sh., & Dariane, A. B. (2007). Direct search approaches using

genetic algorithms for optimization of water reservoir operating policies.Journal of Water Resources Planning and Management, 133(3), 202–209.

Nandalal, K. D. W., & Simonovic, S. P. (2002). State of the art report on systemanalysis methods for resolution of conflicts in water resources management.Div. of Water Sciences, UNESCO.

Nardini, A., Piccardi, C., & Soncini-Sessa, R. (1994). A decompositionapproach to suboptimal control of discrete-time systems. Optimal ControlApplications and Methods, 15(1), 1–12.

1606 A. Castelletti et al. / Automatica 44 (2008) 1595–1607

Niederreiter, H. (1992). Random number generation and quasi-monte carlomethods. Philadelphia: SIAM.

Oliveira, R., & Loucks, D. P. (1997). Operating rules for multireservoirsystems. Water Resources Research, 33(4), 839–852.

Orlovski, S., Rinaldi, S., & Soncini-Sessa, R. (1983). A min max approachto storage control problems. Applied Mathematics and Computations,12(2–3), 237–254.

Orlovski, S., Rinaldi, S., & Soncini-Sessa, R. (1984). A min max approach toreservoir management. Water Resources Research, 20(11), 1506–1514.

Ozelkan, E. C., Galambosi, A., Fernandes, E., & Duckstein, L. (1997). Linearquadratic dynamic programming for water reservoir management. AppliedMathematical Modeling, 21, 591–598.

Pianosi, F., & Soncini-Sessa, R. (2008). Extended ritz method for reservoirmanagement over an infinite horizon. In 17th IFAC world congress.

Piccardi, C. (1993a). Infinite-horizon minimax control with pointwise costfunction. Journal of Optimization Theory and Applications, 78, 317–336.

Piccardi, C. (1993b). Infinite-horizon periodic minimax control problem.Journal of Optimization Theory and Applications, 79, 397–404.

Piccardi, C., & Soncini-Sessa, R. (1991). Stochastic dynamic programmingfor reservoir optimal control: Dense discretization and inflow correlationassumption made possible by parallel computing. Water ResourcesResearch, 27(5), 729–741.

Read, E. G. (1989). A dual approach to stochastic dynamic programming forreservoir release scheduling. In Dynamic programming for optimal waterresources systems analysis (pp. 361–372). Englewood Cliffs: Prentice-Hall.

Rippl, W. (1883). The capacity of storage reservoirs for water supply. Minutesof Proceedings, Institution of Civil Engineers, 71, 270–278.

Rosa, L. P., Santos, M. A., Matvienko, B., Santos, E. O., & Sikar, E. (2004).Greenhouse gas emissions from hydroelectric reservoirs in tropical regions.Climatic Change, 66, 9–21.

Saad, M., Turgeon, A., Bigras, P., & Duquette, R. (1994). Learningdisaggregation technique for the operation of long-term hydroelectric powersystems. Water Resources Research, 30(11), 3195–3203.

Salewicz, K. A., & Nakayama, M. (2004). Development of a web-baseddecision support system (DSS) for managing large international rivers.Global Environmental Change, 14, 25–37.

Sharma, V., Jha, R., & Naresh, R. (2004). Optimal multi-reservoir networkcontrol by two-phase neural network. Electric Power Systems Research, 68,221–228.

Sniedovich, M. (1979). Reliability-constrained reservoir control problems: 1.Methodological issues. Water Resources Research, 15(6), 1574–1582.

Soncini-Sessa, R., Castelletti, A., & Weber, E. (2007). Integrated andparticipatory water resources management. Theory. Amsterdam: Elsevier.

Soncini-Sessa, R., Rizzoli, A. E., Villa, L., & Weber, E. (1999). TwoLe: Asoftware tool for planning and management of water reservoir networks.Hydrological Science Journal, 44(4), 619–631.

Soncini-Sessa, R., Zuleta, J., & Piccardi, C. (1991). Remarks on the applicationof a risk-averse approach to the management of El-Carrizal reservoir.Advance in Water Resources, 13(2), 76–84.

Su, Y. S., & Deininger, R. A. (1972). Generalization of White’s method ofsuccessive approximations. Operations Research, 20(2), 318–326.

Su, Y. S., & Deininger, R. A. (1974). Modeling regulation of Lake Superiorunder uncertainty of future water supplies. Water Resources Research,10(1), 11–25.

Tauxe, G. V., Inman, R. R., & Mades, D. M. (1979). Multiobjectives dynamicprogramming with application to a reservoir. Water Resources Research,15(6), 1403–1408.

Tejada-Guibert, J. A., Johnson, S. A., & Stedinger, J. R. (1995). The value ofhydrologic information in stochastic dynamic programming models of amultireservoir system. Water Resources Research, 31(10), 2571–2579.

Thompson, M., Davison, M., & Rasmussen, H. (2004). Valuation and optimaloperation of electric power plants in competitive markets. Operationsresearch, 52(4), 546–562.

Trott, W. J., & Yeh, W. (1973). Optimization of multiple reservoir systems.Journal of the Hydraulic Division ASCE, 99, 1865–1884.

Tsitsiklis, J. N., & Van Roy, B. (1996). Feature-based methods for large scaledynamic programming. Machine Learning, 22, 59–94.

Turgeon, A. (1980). Optimal operation of multi-reservoir power systems withstochastic inflows. Water Resources Research, 16(2), 275–283.

Turgeon, A. (1981). A decomposition method for the long-term scheduling ofreservoirs in series. Water Resources Research, 17(6), 1565–1570.

Vasiliadis, H. V., & Karamouz, M. (1994). Demand-driven operation ofreservoirs using uncertainty-based optimal operating policies. Journal ofWater Research Planning and Management—ASCE, 120(1), 101–114.

Wallach, D., Makowski, D., & Jones, J. (2006). Working with dynamiccrop models. Evaluation, analysis, parameterization, and applications.Amsterdam: Elsevier.

Wasimi, S. A., & Kitanidis, P. K. (1983). Real-time forecasting and dailyoperation of a multireservoir system during floods by Linear QuadraticGaussian control. Water Resources Research, 19(6), 1511–1522.

Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning,8(3–4), 279–292.

Weber, E., Rizzoli, A.E., Soncini-Sessa, R., & Castelletti, A. (2002).Lexicographic optimisation for water resources planning: The case oflake verbano, Italy. In A. E. Rizzoli & A. J. Jakeman (Eds.), Integratedassessment and decision support, proceedings of 1st biennial meeting ofIEMSS.

White, D. J. (1963). Dynamic programming, Markov chains, and the methodof successive approximations. Journal of Mathematical Analysis andApplications, 6, 373–376.

Wong, P. J., & Luenberger, D. G. (1968). Reducing the memory requirementsof dynamic programming. Operations Research, 16(6), 1115–1125.

World Commission on Dams, (2000). Dams and development: A newframework for decision-making. London, UK: Earthscan Publications Ltd.

Yakowitz, S. (1982). Dynamic programming applications in water resources.Water Resources Research, 18(4), 673–696.

Yeh, W. (1985). Reservoir management and operations models: A state of theart review. Water Resources Research, 21(12), 1797–1818.

Andrea Castelletti was born in Genova in 1974. He re-ceived a MS degree in Environmental Engineering anda Ph.D. in Information Engineering from Politecnicodi Milano, Italy, in 1999 and 2005. Since 2006 he isAssistant Professor of Modelling and Control of En-vironmental Systems in the same university and since2008 Honorary Research Fellow at the Centre for Wa-ter Researches of the University of Western Australia.His main research interests focus on participatory andintegrated modelling and control of environmental sys-

tems, namely water resource systems, and Decision Support System design. Hehas co-authored two international books on integrated water resource manage-ment and more than 20 papers in international journals and conference proceed-ings. He is currently member of the IFAC Technical Committee on Modellingand Control of Environmental Systems (TC 8.3).

Francesca Pianosi was born in Milan in 1980. Shereceived a MS degree in Environmental Engineeringfrom Politecnico di Milano, Italy, in 2004 anda Ph.D. in Information Engineering in 2008. Herresearch interests focus on modelling and controlof environmental systems, and in particular, timeseries analysis and stochastic optimal control of watersystems. She is co-author of an international book onintegrated water resource management.

Rodolfo Soncini-Sessa was born in 1948 in Milano. In1972 he received a Master in Electronic Engineeringfrom the Politecnico di Milano, Italy. He has beenassociate professor of Modelling and Control ofNatural Resources (1982–1986, Politecnico di Milano)and full professor of Automatic Control (1986–1990,Universita di Brescia), before becoming full professorof Natural Resources Management at the Politecnico diMilano in 1990. He has been invited to the InternationalInstitute for Applied System Analysis (IIASA) in

Austria for several research periods. His main research interests are the Design

A. Castelletti et al. / Automatica 44 (2008) 1595–1607 1607

of Decision Support Systems (DSS) for integrated and participatory decisionmaking in the field of water resources, with attention to both quality andquantity of the water. He is chair of the Technical Committee on Modelling

& Control of Environmental Systems of IFAC, and on the Editorial Boards ofWater International and Journal of Environmental Modelling and Software. Heis author or co-author of several books and many papers.