On nonsmooth and discontinuous problems of stochastic systems optimization

16
Reprinted from ROPEAN JOURNAL OF OPERATIONAL RESEARCH Furopean Journal of Operational Research 101 (1997) 230-244 On nonsmooth and discontinuous problems of stochastic systems optimization Yuri M. Ermolieva.*, Vladimir I. Norkin a International Instiiute for Applied System Analysis, A-2361 h e n b u r g , Austria Glushkov Institute of Cybernetics, 252207 Kiiv. Ukraine Received I March 1996; revised I October 1996 ELSEVIER

Transcript of On nonsmooth and discontinuous problems of stochastic systems optimization

Reprinted from

ROPEAN JOURNAL

OF OPERATIONAL RESEARCH

Furopean Journal of Operational Research 101 (1997) 230-244

On nonsmooth and discontinuous problems of stochastic systems optimization

Yuri M. Ermolieva.*, Vladimir I. Norkin a International Instiiute for Applied System Analysis, A-2361 h e n b u r g , Austria

Glushkov Institute of Cybernetics, 252207 Kiiv. Ukraine

Received I March 1996; revised I October 1996

ELSEVIER

ELSEVIER European Journal of Operational Research 101 ( 1997) 230-244

EUROPEAN JOURNAL

OF OPERATIONAL RESEARCH

On nonsmooth and discontinuous problems of stochastic systems optimization

Yuri M. Ermol ieva ,* , Vladimir I. Norkin a International Institute for Applied Systems Analysis, A-2361 Laxenburg, Austria

Glushkov Institute of Cybernetics, 252 207 Kiev, Ukraine

Received 1 March 1996; revised I October 1996

Abstract

A class of stochastic optimization problems is analyzed that cannot be solved by deterministic and standard stochastic approximation methods. We consider risk-control problems, optimization of stochastic networks and discrete event systems, screening irreversible changes, and pollution control. The results of Ermoliev et al. are extended to the case of stochastic systems and general constraints. It is shown that the concept of stochastic mollifier gradient leads to easily implementable computational procedures for systems with Lipschitz and discontinuous objective functions. New optimality conditions are formulated for designing stochastic search procedures for constrained optimization of discontinuous systems. @ 1997 Elsevier Science B.V.

Keywords: Stochastic approximation; Deterministic counterpart; Discontinuity; Theory of distributions; Subgrdients; Stochastic quasi-gradients; Networks; Risk; Mollifiers

1. Introduction

The search for stability, efficiency, balance and equi- librium is a natural feature of antropogenic systems. Optimization tools are needed during various stages of the process: for example, in the collection and rec- onciliation of data, in the identification of parame-

I ters, in sensitivity analysis, and in policy assessment. Smooth (classical) optimization techniques have been influenced by applications in mechanics, physics, and statistics. The analysis of antropogenic systems with complex and often dramatic interactions between man, nature, and technology calls for new approaches that

'Corresponding author. Fax: '943-2236 73147, e-mail: [email protected]

do not rely on the smooth behavior of the system or exact information on its performance.

In this paper we analyze problems arising in op- timization of complex stochastic systems exhibiting nonsmooth behavior, abrupt responses and possibly catastrophic changes. Nonsmooth and discontinuous behavior is typical for systems undergoing structural changes and new developments. Discontinuity is an inherent feature of systems with discrete variables (in- divisibilities) such as manufacturing systems, com- munication networks, and neural nets. In the impulse control, the discontinuity (the size of a jump) itself is a control variable. The lack of scientific information on gradual changes of a system forces analysts to deal with the data-based models in which actual changes are represented as transformations between observable states in a discrete set. In risk control, the possibility

0377-2217/97/$17.00 @ 1997 Elsevier Science B.V. All rights resewed. PII SO377-2217(96)00395-5

Yu.M. Ermoliev, VI. Norkin/European Journal of Operational Research I01 (1997) 230-244 23 1

of an abrupt change is by nature present in the prob- lem. Failure may trigger jumps of the system from one state to another, and the main dilemma for a control policy is to 'hit or miss' an appropriate point in the evolution of the system to prevent possible losses and irreversibility.

The concept of nonsmooth and abrupt change is emphasized in the study of environmental systems by such notions as critical load, surprise, and chemical time bomb phenomena. There are excellent reviews of discontinuous, imperfectly reversible change in eco- logical systems (Holling [ 191 ) and sociotechnical systems (Brooks [ 3 ] ) . The significance of 'extreme events' arguments in climate impact studies was em- phasized by Parry [ 2 9 ] and has been summarized by Wigley [ 3 5 ] as follows: "Impacts accrue (. . .) not so much from slow fluctuations in the mean, but from the tails of the distributions, from extreme events. In many cases, an extreme can be defined as an event where a (. . .) variable exceeds some 'threshold' ". Clark [ 5 ] argued that such nonlinearity requires risk-based ap- proaches to assess and control possible impacts and that the deviation of extremes from threshold levels may be important.

There are a number of methodological challenges involved in the control of abruptly changing (non- smooth) stochastic systems. One obvious obstacle is the lack of scientific information on relationships and thresholds. A less-known challenge is the lack of analytical tools to assess the propagation of abrupt changes and related risks through the system and the interactive roles played by uncertainties, changes, and policy responses across spatial and temporal scales.

In this article we deal with some of these challenges calling for new optimization tools. Behavior of a non- smooth system at local points can not be predicted (in contrast to classical smooth systems) even outside an arbitrary small neighborhood. The main idea is to de- velop approaches that rely on a 'global view' of sys- tems behavior or, as Ho [ 181 argued, a bird's-eye view of system responses. Our discussion primarily con- centrates on the concept of mollifier subgradients that provide such a technique (see Ermoliev et al. [ 9 ] ) .

In Section 2 we analyze the shortcomings of ex- isting approaches by using some important classes of stochastic systems with nonsmooth performance functions. Two types of discontinuities are distin- . guished: discontinuities of sample performance func-

tions and discontinuities of expectation functions. Section 3 introduces the complexity of nonsmooth problems even in cases where the interchange of in- tegration and differentiation operations is possible. This situation imposes obstacles to deterministic and standard stochastic approximation methods. We show the way in which the concept of mollifier subgradient enables us to use finite-difference approximation-type procedures for locally Lipschitz and discontinuous functions. This section, along with Section 4, also discusses extensions of the infinitesimal perturbation analysis of discrete event systems (Ho and Cao [ 171 ; Suri [33] ) to nonsmooth expectations. In Section 4 notions of mollifier subgradients and cosmic con- vergence (Rockafellar and Wets [ 3 1 ] ) are used to formulate the optimality conditions for discontinuous problems in a form that permits stochastic search procedures under rather general constraints; proofs are given in Appendix A. The concluding remarks in Section 5 indicate some numerical experiments and discuss directions for further research.

2. Nonsmooth stochastic systems

2.1. Hit or miss control policy: basic opdimization procedures

The main difficulties in the optimization of discon- tinuous systems are easily illustrated by simple ex- amples of 'hit or miss' decision problems arising in risk control. Assume that, at some point in the evo- lution of a system (ecosystem, nuclear power plant, economic system) a policymaker's decision not to in- tervene in the ongoing processes leads to a 'failure' with considerable and possibly irreversible damages. Suppose that the system can be used during time inter- val [0, T I , but the actual lifetime T may be shorter: if we do not shut down the system at time x < T , there may be a failure at w < x, hence r = min(x, w). The profit of the system without failure is proportional to T, but the failure at w < x leads to high losses at cost b. Suppose w is distributed on the interval [0, TI with a continuous density function p ( o ) and the sample performance function is defined as

-ax if O < x < w, f (x ,w) = b - a w i f w < x < T .

232 Yu.M. Ermoliev, YI. Norkin/European Journal of Operational Research 101 (1997) 230-244

The sample performance function f (x, w) is discon- tinuous with respect to both variables. The expected cost (performance) function, which can serve as a risk indicator for a solution x E [0, T I , has the form of the expectation function

where IA = Iq ( w) is the indicator function of the event A: I*(@) = 1 if w E A and IA(w) = 0, otherwise.

The minimization of F (x ) is an example of a stochastic optimization problem (see Ermoliev and Wets [ 1 1 ] ) . In general, F (x ) has the form of a mul- tiple integral with an implicitly given probability dis- tribution and random performance function f (x, w).

Let us now use function (1) to outline possible approaches to the optimization of stochastic systems. The main problem is the lack of exact information on F (x ) . A common approach is to approximate the expectation F (x ) by the sample mean

where wk, k = 1, . . . , N, are independent samples of w. Thus the original problem with the expecta- tion function F(x) is approximated by a deterministic problem with objective function FN(x) that could be solved, if possible, by a broad variety of deterministic methods. This approach has several shortcomings.

1. It cannot be used when the underlying probability distribution depends on decision variable x or when the functions f (x, w) are given implicitly.

2. As in problem ( 1) the sample performance func- tion f (., wk) is discontinuo~s, although the expecta- tion F (x ) may be a continuously differentiable func- tion. Since functions f (., wk), k = 1, . . . , N, are dis- continuous, the function F~ (x) is also discontinuous at each point x = wk. The number of jumps (and lo- cal optimal solutions) tends to infinity if N + rn, although the original function F (x ) is smooth.

3. The empirical approximation FN(x) may even destroy the convexity of F ( x ) . For example, the sam- ple mean approximation of the function

where w = {a;, b;) are normally distributed random variables and Ea; > 0 may be nonconvex.

4. The convergence of min F ~ ( x ) to min F ( x ) as N -+ co is established in practically all important cases. Despite this, the use of gradient-type procedures for minimizing FN(x) may result in local solutions that have nothing in common with local solutions of t the original problem. It occurs in cases where the in- terchange of differentiation and expectation operators is impossible (see Eq. (4) and Section 3) : gradient 3

VFN = ( 1/N) V f (x, wk) does not approxi- mate VF(x) .

Nevertheless, a remarkable feature of performance function (1) is that it can be successfully utilized in the design of the solution strategy: despite the discon- tinuity of f (x, w), the function F ( x ) is continuous and smooth. The function F (x ) may also be convex, although f (x, w) is not convex for some w. Therefore it is advantageous to use stochastic search procedures dealing directly with the original function F (x ) :

where Ck is generally a biased statistical estimate (stochastic quasi-gradient) of the V F ( X ~ ) at current point xk and pk is a step-size multiplier. Unbiased estimates Ck are also called stochastic gradients (or subgradients) and generalized gradients depending on whether F(x) is a continuously differentiable or nonsmooth function.

Let us note that the gradient V f (., w) of the sam- ple performance function f (., w) exists everywhere except for x =. w. Define

At point x, where V f (x ,w) exists, g(x, w) = V f (x, w). Obviously, expectation Eg(x, w) exists, t

but the 'interchange formula' for the gradient and the mathematical expectation is not valid: VF(x ) + E [g(x, w ) 1. Indeed the direct differentiation of func- b

tion ( 1 ) yields

where f(x,x+o) = lim,+,+o f (x ,y ) , and p ( . ) is a continuous density for w . Therefore the discontinuity

Yu.M. Ermoliev, K1. Norkin/European Journal of Operational Research 101 (1997) 230-244 233

of f (x, w) results in an additional term in VF(x) , and we have the following unbiased estimate of the gradient VF(x ) :

The estimate 6 (stochastic gradient) can be used in I stochastic methods (3). If F (x ) is a twice differen-

tiable function, then it is impossible to use the sin- gle run stochastic finite-difference approximation of VF(x) at x = xk:

or the standard stochastic approximation procedure with

where e' is the j-th coordinate vector and wk, wM), wk' , . . . , wkn are independent samples of w, Ak -+

0. Unfortunately, besides strong requirements on the differentiability of F(x) , the variance of stochastic quasi-gradient (6) tends to infinity as k + oo. In contrast, the variance of single run estimate (5) for smooth f (., w) tends to 0 as k -+ oo. For nonsmooth f (., w) estimate (5) also leads to a reduction of the variance after the introduction of smoothing effects by adding auxiliary random variables as discussed in Section 3 (see Ermoliev and Gaivoronski [7] ). From the general idea of mollifier subgradients (Ermoliev et al. [9] ), it roughly follows that for nondifferentiable F (x ) finite difference approximations (5) and (6) can also be used in procedure (3) with slight modi-

1

fications: these vectors must be calculated not at the current point xk but at a point yk randomly chosen in a neighborhood of xk (see discussion in Section 3). In other words yk = xk + ek, where ek is a stochastic vector such that Ilekll -+ 0 as k + oo. This statement is clarified in the following sections.

work has an entry and an exit and it is considered op- erating if a path exists from the entry to the exit. De- note the random time for element i to work without failure by T ~ ( x , w ) , where x E Rn represents a vec- tor of control parameters and w is a vector of random factors. Then lifetime f (x, w) of the system is ex- pressed through times ~ i ( x , w) by means of max and min operations:

f ( x , w) = max m i n ~ , (x, w), P,EP eEP,

where P is the set of paths from the entry to the exit of the network; index e denotes an element within a path.

It is obvious that for rather general networks the function f (x, w) cannot be calculated analytically (it is difficult to enumerate paths) to implement determin- istic approach ( 2). But a simple algorithm allows us to calculate f (x, w) and its stochastic quasi-gradients for each observation w. Very simple and practical sit- uations also show that F(x) = Ef (x, w) may be a nondifferentiable function (see Krivulin [23] and Er- moliev and Norkin [ 8 ] ) .

The composite function f (x, w) defined by max and min operations has a rather complicated nondif- ferentiable character. The calculation of a subgradient of f (., w) is impossible in the case when the chain rule is not valid. For example, for Lipschitz continu- ous functions f l (x, w) = 0, f*(x, w) = -IIxII, if

then for Clarke's subdifferential d f (x, w) (see Clarke [4] ) at x = 0 we have only the inclusion

and hence dE f (0, w) # Ed f (0, w). In such cases we can use general purpose estimates (5) and (6) and many other similar estimates based on the concept of mollifier subgradients (see Sections 3 and 4).

2.3. A simple transfer line 2.2. Stochastic networks with failures

Consider a network of connected elements which can be in 'working' or 'not working' states. The net-

A transfer line (see Mirzoahmedov [26] ) consists of n sequentially connected devices. A customer who enters the line is sequentially served by each device

234 Yu.M. Ermoliev, YI. Norkin/European Journal of Operational Research 101 (1997) 230-244

if it has been switched on in advance. The variable xi denotes the moment of switching on device i, yi is the moment when the customer leaves device i, yo(w) represents the moment the customer comes to the line, and ri(w) denotes the (random) service time by i- th device. Let a i and bi denote costs per unit of time associated with waiting by the customer for switching on i-th device and its corresponding service. Then the total random cost f" is calculated sequentially as

for i = 1,2, . . . , n. Therefore functions f'(x, y, w) are again constructed by means of max and min operations and are nonconvex and nonsmooth. The discontinuous problems are encountered in the case of periodically operating devices or devices that may fail (Ermoliev et al. [12]).

2.4. Pollution control under extreme events

A common feature of most models used in design- ing pollution-control polices is the use of transfer co- efficients ai, linking the amount of pollution xi emitted by source i to the resulting pollution concentrations y, at receptor location j as

The coefficients ai, are often computed with Gaussian- type diffusion equations. These equations are solved over all possible meteorological conditions, and the outputs are then weighted by the frequencies of me- teorological inputs over a given time interval, yield- ing average transfer coefficients. Deterministic models ascertain cost-effective emission strategies subject to achieving exogenously specified environmental goals, such as ambient average standards at receptors. These models have been improved by the inclusion of chance constraints that account for the random nature of the problem to reduce extreme events:

for j = 1 , . . . , m, namely, the probability that the de- position level in each receptor (country) j will not exceed the critical load (threshold) qj at a given prob-

F ability (acceptable risk level) p,. If there is a fi- nite number of possible values (scenarios) w, reflect- ing, for example, prevailing weather conditions, then Fj(x) is a piecewise constant function. The gradients of such functions are 0 almost everywhere, hence, the conventional optimization techniques cannot be used.

2.5. Screening irreversible changes

In the following example we use a data-based model of cervical cancer screening (Oortmarssen and Er- moliev [28] ). A life history sample is represented by w. There are two time moments: Tp denotes the time of entry into the progressive screen-detectable stage and TD is the time of clinical diagnosis of a cancer. Therefore,

where Tp and ZpD are independent nonnegative ran- dom variables with probability distribution functions Fp and FPD. The disease can be detected by screen- ing examination at time x such that Tp < x < TD. In this case, life expectancy is defined by a random vari- able TL with distribution F L ( t ) Otherwise survival time following clinical diagnosis and treatment is de- scribed by a' nonnegative random variable ZDc with distribution FDc ( t ) . A sample of the life history is w = (Tp, To, Tc, TL). The years of life gained is defined as

TL-Tc ifTL > Tc andTp < X < TD, f (x, w) = otherwise.

Therefore, the expected performance is

where q(TD) denotes the expected number of years of life gained at a given TD. The sample performance is again nondifferentiable and implicitly a given func- tion. The additional complexity is that the positive val- ues of these functions occur with low probability.

Yu.M. Ermoliev, V l . Norkin/European Journal of Operational Research 101 (1997) 230-244 235

2.6. Queuing networks

A network consists of devices which 'serve' cus- tomers. At any moment device i = 1,2, . . . serves only one customer, which is then transferred to another de- vice in accordance with a certain routing procedure.

r If the device is busy, then the customer waits in a queue and is served according to the rule first come first served. Let ni represent the length of the queue

s at the initial moment; rij(x, w) is the (random) ser- vice time of message j depending on some control parameter x and uncontrolled (random) parameter w; ai i (x, w) denotes the arrival time when message j comes to node i; pij(x, w) is the time when device i starts to serve customer j; and yij(x, w) indicates the time when device i finishes serving customer j . The customer-routing procedure is given by integer func- tions pij(x, w) defining a destination node for j-th message served at i-th node.

The logic of a node operation is described by the following recurrent relations:

y. . - p.. + 7.. IJ - IJ U

It is essential that times yij for this and more general networks can be expressed through functions ri, (x, w)

9 by means of max and min operations and by posi- tive linear combinations (Krivulin [23], Ermoliev and Norkin [ 8 ] ). TO illustrate this fact, assume pij = pi.

I If we denote

Zi = {nodes r I pr = i),

then

Various important indicators (waiting time, queue length, node load and so on) of network performance are based on times yij (x , w). Therefore, they are im- plicitly given functions of rij(x, w) and nonsmooth in x.

3. Nonsmooth sample functions

Consider the following general stochastic optimiza- tion problem:

Minimize F (x ) = E, f (x, w) (7)

subject to x E X C Rn, (8)

where w E fl, ( R , Z , P ) is some probability space, E, denotes a symbol of mathematical expectation, and f : X x fl -+ R' is a random (i.e., measurable in w for fixed x) integrable function which can be nonconvex, nonsmooth, and even discontinuous. In this problem, the expectation function F (x ) may still be differen- tiable with gradient

for appropriately defined g(x, w ) (see Krivulin [23], Glasserman [ 131, and Rubinstein and Shapiro [32] ) . This approach corresponds to the infinitesimal pertur- bation analysis for discrete event systems (see Ho and Cao [ 171 and Suri [33] ). It is important to note that if equality (9) is valid but f ( x , w) is not continu- ously differentiable, then the convergence of method (3) with tk = g ( ~ k , w) can only be studied within the general context of nonsmooth optimization techniques considered in this paper.

3.1. Lipschitzian expectations

Consider problem (7) and (8), where f : Rn x f l -+ R' satisfies the Lipschitz condition

with integrable in square Lipschitz constant LK(w) and K is a compact set in Rn. Expectation func- tion F(x) is also Lipschitzian with constant LK = s a L ~ ( w > p ( d w ) .

Denote dF (x) , d f (x, w ) as Clarke's subdifferen- tials [4] of F (x ) and f (x , w). The main difficulty concerns the estimation of a subgradient from dF(x) .

236 Yu.M. Ermoliev, VI. Norkin/European Journal of Operational Research 101 (1997) 230-244

There is, in fact, no calculus of such a vector, for ex- ample, by using a chain rule. The interchange formula for differentiation and integration operator,

is generally not valid (see Section 2.2), and therefore it is impossible to estimate an element from dF(.) as- suming we can calculate elements of d f (., w). Usu- ally only a set Gf(., w) is known containing d f (., w).

Let q5 : Rn 4 R' be some probability density func- tion on Rn such that 4 ( x ) = 0 outside some bounded set in Rn. Consider a parametric family of mollifiers $H (x) = (1/On)q5(x/8) (see Section 3.2 for exact definition), 8 > 0, and a family of smoothed (aver- aged) functions

Note that FB(x) incorporates global information on slopes of function F (x ) in a vicinity defined by 'weights' c C , $ ( . ) . The functions Fs(x) have been considered in optimization theory (see Yudin [34], Archetti and Betrb [ 11, Gupal [ 141, [ 151, Gupal and Norkin [ 161, Batuhtin and Maiboroda [2], Mayne and Polak [24], Kreimer and Rubinstein [22], and Ermoliev et al. [9] ). The convolution with an ap- propriate mollifier improves differentiability, but also increases computational complexity of resulting prob- lems since it changes a deterministic function F (x ) into an expectation function defined as multiple in- tegral. Therefore, this operation is meaningful only with appropriate stochastic optimization techniques.

If function $(x) is continuously differentiable (or constant inside some convex set and equal to zero outside it), then the smoothed functions Fo (x) , 0 > 0, are continuously differentiable and converge to F(x) as 8 -+ 0 uniformly in X. Suppose random functions f (x, w) are measurable in both variables (x, w). Then

where fs(x,w) = J,, f(y,w)rC,o(y - x)dy, a > 0. The functions f o (x, w) are Lipschitzian in x (with

the same Lipschitz constant Lx(w) ) and even con- tinuously differentiable in x. Therefore, the functions Fo (x) , 8 > 0, are also continuously differentiable and the following differentiation formula is true:

From this formula one can obtain different represen- F;

tations for VFo(x) depending on the form of the mol- lifier. If the mollifier is a uniform probability density in the cube {x : [.xi[ < $3, i = G) (as in Gu- pal [ 14,151 ), then

where

where the ei are unit coordinate vectors. This means that VFo(x) is a mathematical expectation of the fi- nite difference approximation t $ ( ~ , 77, w), where w has distribution P and r] = ( T I , . . . , rln) is a random vector with components uniformly distributed on the interval [ - $, + $1. In other words, to (x, 77, w) is an unbiased estimate of the gradient VFo at point x. We call each such vector the stochastic mollifier gradient ?

of F(x) . The vector lo requires calculation of func- tion f (x, w) at 2n points. Of course, there may be various other finite difference estimators for VFo(x) . (see Gupal [ 151, Katkovnik [20], Kreimer and Ru- binstein [22], Ermoliev and Gaivoronski [7], and Section 3.2).

If we know the analytical structure of Lipschitz function f ( ., w ) and its Clarke subgradient g(x, w ) , then formula ( 10) can be rewritten as

Yu.M. Ermoliev, VI. Norkin/European Journal of Operational Research 101 (1997) 230-244 237

The stochastic quasi-gradient method of unconstrained (X = Rn) optimizationof Lipschitzfunction F ( x ) has the form of procedure (3) with tk = g(xk +Okrlk, wk) or tk = tot (xk, 77k, wk), where nonnegative step mul- tipliers pk and smoothing parameters cuk satisfy the conditions

lim pk = lim Ok = lim pk/Ok k-oo k-oo k-oo

= lim lok - Ok+1 I/pk = 0. k-oo

(13)

The procedure uses optimization steps concurrently with approximation steps as proposed in Ermoliev and Nurminski [ 101 and Katkovnik and Kulchitsky [21].

Theorem 3.1. (Gupal [ 151 .) Assume that random trajectories {xk ) generated by procedure (3) are bounded. Suppose also that the set of function values F ( x ) on the set X* = {x E R" I 0 E dF(x) ) is jinite or countable. Then under the above-mentioned condi- tions all cluster points of almost all trajectories {xk) belong to X* and the sequence { F ( x ~ ) ) has a limit as k + m .

Conditions (12) are typical for standard stochas- tic approximation-typealgorithms. Additional require- ments ( 13) are not very restrictive (for instance, pk = Clki', Ok = C/kq with 3 < p < 1 and 0 < q < p, C > 0, satisfy them). Thus procedure ( 3 ) , with con- dition ( 12) and ( 13), generalizes standard stochas- tic approximation methods for nonsmooth functions. The case ck = tsk(xk, hk, wk) provides a general pur- pose approach. In the case tk = g(xk + Okhk, wk) a question remains unanswered: how do we calculate

I

Clarke's subgradients of Lipschitz functions f (., w)? To answer this question we consider the following im- portant case.

#

3.2. Generalized differentiability

The calculus of subgradients (see Clarke [ 4 l ) in general states only that 3 f (., w) Gf (., w), where Gf (., w) is some extended subgradient set deter- mined by the structure of f . The equality holds true for a special case of subdifferentially regular func- tions which does not cover important applications.

As demonstrated in Section 2, in many situations we deal not with a general class of Lipschitz func- tions but with a subclass generated from some basic (continuously differentiable) functions by means of maximum, minimum, or smooth transformation operations. Consider the case when Gf (x, w) is a singleton for almost all x.

Definition 3.1. (Norkin [27])The function f : R'I -+

R is called generalized differentiable (GD) at x E R" if in a vicinity of x there exists an upper semicontin- uous multivalued mapping Gf with convex compact values Gf (x) such that

where (., .) denotes the inner product of two vectors, g E Gf (y) , and the remainder term satisfies the fol- lowing condition: limk o(x, yk , gk) / [ I yk - xll = 0, for any sequences yk + X, gk 4 g, gk E Gf ( yk) . Func- tion f is called generalized differentiable if it is gen- eralized differentiable at each point x E Rn.

Generalized differentiable (GD) functions pos- sess the following properties (see Norkin [27] and Mikhalevich et al. [25] ): they are locally Lips- chitzian; continuously differentiable, convex, and concave functions are generalized differentiable; the class of GD functions is closed with respect to max and min and operators superpositions; there is a cal- culus of subgradients,

and subdifferential Gfo(f,,,,,,$,) of a composite func- tion fo( f l , . . . , f,,) is calculated by the chain rule; the class of GD functions is closed with respect to taking expectation with GF(x) = EGf (x, w) for F(x) = E f (x, w), where f (., w) is a GD function; and daa,ke f (x ) C Gf (x) and Gf (x) is a singleton almost everywhere in Rn.

Finally, for minimization of a GD expectation func- tion F(x) = E f (x, w) over a convex compact set K the following stochastic mollifier gradient method is applicable:

238 Yu.M. Ermoliev, KI. Norkin/European Journal of Operational Research 101 (1997) 230-244

where I I K ( y ) is the orthogonal projection of y on K. From Section 2 it follows that generalized differ-

entiable functions may be important for queuing and other discrete event systems. Therefore we can view calculus ( I s ) , together with procedures ( 16)-( 18), as an extension of smooth perturbation analysis (see Ho and Cao [ 171 and Suri [33] ) to nonsmooth cases.

4. Stochastic discontinuous optimization

In this section we extend the results of Ermoliev et al. [9] to discontinuous stochastic optimization prob- lems. These results are essentially based on the notion of discontinuity preventing the system from instanta- neous jumps and returns to normal states (strongly lower semicontinuous (lsc) functions). This notion must be elaborated in the case of stochastic systems.

4.1. Some classes of discontinuous functions

Definition 4.1. The function F : Rn + R' is called strongly lower semicontinuous at x if it is lower semi- continuous at x and there exists a sequence xk --+ x with F continuous at xk (for all k ) such that F ( x ~ ) -+

F(x ) . The function F is called strongly lower semi- continuous on X C Rn if this holds for all x € X.

To give a sufficient condition for the expectation F (x ) = E f (x, w) to be strongly lower semicontinu- ous we introduce subclasses of directionally continu- ous and piecewise continuous functions.

Definition 4.2. The function F : Rn --+ R' is called directionally continuous at x if there exists an open (direction) set D(x) such that it contains sequences xk E D(x) , xk + X, and for each such sequence F (xk) + F (x) . Function F (x ) is called directionally continuous if this holds for any x E Rn.

Definition 4.3. The function F (x ) is called piecewise continuous if for any open set A C Rn there is another open set B C A on which F (x ) is continuous.

Proposition 4.1. If the function F(x) is lsc, piece- wise continuous and directionally continuous, then it is strongly lower semicontinuous. Y

Proof. By definition of piecewise continuity, for any open vicinity V(x) of x we can find an open set B c D(x) n V(x) on which function F is continu- ous. Hence there exists sequence xk E D(x), xk + x with F continuous at x k . By definition of directional continuity, F ( x ~ ) 4 F(x ) .

Properties of directional continuity, piecewise con- tinuity, and strong lower semicontinuity can be eas- ily verified for one-dimensional functions. Obviously, these properties are preserved under continuous trans- formations. The next proposition clarifies the structure of multidimensional discontinuous functions of inter- est.

Proposition 4.2. If the function F ( x ) has the form F ( x ) = Fo[F l (x l ) , . . . , F,,,(x,)], where x = (x l , . . . , x,, ), xi E Rnl, function Fo(.) is continu- ous and functions Fi(xi), i = 1, . . . , n, are strongly lsc (directionally continuous), then the composite function F ( x ) is also strongly lsc (directionally continuous).

I f F(x ) = Fo [Fl (x) , . . . , F,,, (x) 1 , x E Rn, where Fo(.) is continuous and F,(x), i = 1, . . . ,m , are piecewise continuous, then F(x) is also piecewise continuous.

The proof is evident.

Proposition 4.3. Assume that the function f (., w) is I

locally bounded around x by an integrable ( in w) function, piecewise continuous around x and almost sure (a.s. ) directionally continuous at x with direction set D (x, w) = D ( x ) (not depending on w ) . Suppose w takes only a jinite or countable number of values. Then the expectation function F(x) = E f (x, w) is strongly lsc at x.

For proof see Appendix A.

Yu.M. Ermoliev, V1. Norkin/European Journal of Operational Research 101 (1997) 230-244 239

4.2. Moll$er subgradients

Averaged functions associated with a given function F (x ) are defined by means of a family of mollifiers (density functions). Let us introduce the necessary notions and facts, which are generalized in Section 4.3

t to the case of constrained problems.

Definition 4.4. Given a locally integrable (discontin- uous) function F : Rn + R' and a family of molli- fiers

where @(.) 2 0 and JR,, fi0 (z)dz = 1, the associated family {Fo, 6 E R+) of averaged functions is defined

by

Example 4.1. Assume F(x) = E f (x, w). I f f (x, w) is such that E,l f (x, w)l exists and grows at infinity not faster than some polynomial of x and the random vector v has standard normal distribution, then for

we have

The finite difference approximations ts(x, v, w) are unbiased estimates of VFB(x). As before, we can call them stochastic mollifier gradient of F(x) .

Definition 4.5. (See, for example, Rockafellar and Wets [30] .) A sequence of functions {Fk : Rn -, R) epi-converges to F : Rn -+ R relative to X C Rn if for any x E X,

(i) liminfk,, Fk(xk) 2 F(x) for all xk -, x, xk E X;

(ii) limk,, Fk (xk ) = F(x ) for some sequence xk + X, xk E X. The sequence {Fk) epi-converges to F if this holds relative to X = Rn.

For example, if g: R" x Rn' + R is (jointly) lsc at (x,y) and is continuous in y at 7, then for any sequence yk -, L, the corresponding sequence of functions Fk( . ) = g ( . , y k ) epi-converges to F ( . ) = d . 9 ~ ) .

We use the following important property of epi- convergent functions.

Theorem 4.1. Ifa sequence offunctions { F ~ : R" + - R ) epi-converges to F : Rn -, R, then for any com- pact K c Rn,

= inf F, K

where

I f F ~ ( x ~ ) < infKe Fk + Sk, xk E K,, Sk L 0 as k -+

oo, then

lim sup(lim sup xt ) C argmin,F, €10 k

(21

where (lim supk xf ) denotes the set X , of cluster points of the sequence {x:) and (lim supel0 X,) de- notes the set of cluster points of the family {X,, E E R+) as E 1 0.

For proof see Appendix A. Jointly with Propositions 4.1 and 4.3 the follow-

ing statement gives sufficient conditions for averaged functions to epi-converge to the original discontinuous expectation function.

Proposition 4.4. (Ennoliev et al. [9] .) For any strongly lower semicontinuous, locally integrable function F : Rn -+ R, any associated sequence of averaged functions IFs,, Bk 1 0) epi-converges to F.

Propositions 4.1, 4.3 and 4.4 and Theorem 4.2 justify the use of stochastic mollifiers.

240 Yu.M. Ermoliev, VI . Norkin/Europeun Journal of Operational Research 101 (1997) 230-244

Definition 4.6. Let function F : Rn -+ R be locally integrable and {F, := Fst) be a sequence of averaged functions generated from F by means of the sequence of mollifiers {$k := $ok : R" + R), where Bk L 0 as k -+ oo. Assume that the mollifiers are such that the averaged functions Fk are smooth (of class c ' ) . The set of 9-mollifier subgradients of F at x is by definition

i.e., d rF(x) consists of the cluster points of all pos- sible sequences { v F ~ ( x ~ ) ) such that xk -+ X.

Theorem 4.2. (Ermoliev et al. [9] .) Suppose that F : Rn -+ R is strongly lower semicontinuous and locally integrable. Then for any sequence {$ok) of smooth molli$ers, we have 0 E +F(x ) whenever x is a local minimizer of F.

4.3. Constrained discontinuous optimization

Theorem 4.2 can be used for constrained optimiza- tion problems if exact penalties are applicable. Un- fortunately, in stochastic optimization exact values of constraints are often not available. Besides, we also en- counter the following difficulties. Consider min{fi I x 3 0). In any reasonable definition of gradients, the gradient of the function fi at point x = 0 equals +oo. Hence, to formulate necessary optimality condi- tions for such problems and to possibly involve dis- continuities we need a special notion which incorpo- rates infinite quantities. An appropriate notion is a cos- mic vector space R", introduced by Rockafellar and Wets [31].Denote R+ = { X E R I x > 0) and%= R+ U {+oo).

Definition 4.7. Define the (cosmic) space R" as a set of pairs x = (x , a ) , where x E Rn, llxll = 1 and a E %. All pairs of the form ( x , 0) are considered identical and are denoted as 6.

A topology in the space R" is defined by means of cosmically convergent sequences.

Definition 4.8. Sequence (xk , a k ) E F is called (cosmically) convergent to an element (x, a ) E R" (denoted c - limk,,(xk, a k ) ) if either limk ak =

a = 0 or there exist both limits limk xk E Rn, limk ak E R" and x = limk xk, a = limk ak + 0, i.e.,

Denote T

c - Limsup, (xk, a,)

= {(x, a ) E F / 3{k,) :

For a closed set K c Rn denote the tangent cone (at x E K) by TK (x ) , the normal cones by

~ K ( X ) = {U E Rn I (u ,w) < 0 for all w E TK(x)},

NK (x) = lim sup fiK (x) , x-x

and the extended normal cone by

Definition 4.9. Let function F : Rn --t R be locally integrable and {Fk := FHk) be a sequence of averaged functions generated from F by convolution with mol- lifiers {$, := qoL : Rn -+ R), where Bk L 0 as k +

oo. Assume that the mollifiers are such that the aver- aged functions F, are smooth (of class C 1 ) . The set of the extended $-mollifier subgradients of F at x is by definition

where

arbitrary unit vector if I I V F k ( x k ) 1 1 = 0. N$ ( x k ) = { V F k ( x k )

otherwise, I l V F k ( x k ) l l '

i.e., J+ ~ ( x ) consists of the cluster points of all pos- sible sequences {(N%(x", ~ ~ V F ~ ( X ~ ) 11)) such that xk 4 X.

The extended mollifier subdifferential ?,,, F (x) is always a nonempty closed set.

Yu.M. Ermoliev, VI. Norkin/European Journal of Operational Research 101 (1997) 230-244 24 1

Now we can formulate necessary optimality condi- tions for the constrained discontinuous optimization problem: min{F(x)l x E K), where F ( x ) may have the form of the expectation.

Theorem 4.3. Let K be a closed set in Rn. Assume t that a locally integrable function F has a local mini-

mum relative to K at some point x E K and there is a sequence xk E K, xk + X, with F continuous at xk and F ( x ~ ) -+ F(x) . Then, for any sequence {$k}

of smooth moll$ers, one has

where -

-d*F(x) = { ( - g , a ) E R" I ( g , a ) E >*F(x)).

For proof see Appendix A.

Example 4.2. Consider an optimization problem

Min {fi 1 ~ 3 0 ) .

and thus

Proposition4.5 shows that optimality conditions are also satisfied for limits of some local minimizers x, of relaxed problems min{F(x) I x E Kc = K + EB}.

Proposition 4.5. Let x, be a local minimizer such that there exists a sequence x,k --t x,, x,k E K,, with F continuousat xf and ~ ( x : ) + F(x,) as k --t co. Assume x,", --t x for some E,, 1 0 as m --t w. Then condition (22) is satisjed at x.

The proof follows from Theorem 4.3 and closeness of the (extended) mollifier subdifferential mapping x + - d,,,F(x) and the (extended) normal cone mapping (x, E) --t NK,(x).

Proposition 4.6. If F is strongly lsc and the con- straint set K is compact, then the set X* ofpoints sat-

isbing necessary optimality condition (22) is non- empty and contains at least one global minimizer of F in K.

The proof follows from Theorem 4.1 and Proposi- tion 4.4.

Theorem 4.3 and Propositions 4.5 and 4.6 immedi- ately give some indication for approximately solving the problem. Let us fix a small smoothing parameter B and a small constraint relaxation parameter E and in- stead of the original discontinuous optimization prob- lem consider a relaxed smoothed optimization prob- lem:

Min [FB(x) I X E K , ]

Then the stochastic gradient method has the following form: x0 is an arbitrary starting point;

where E{te(xk) I xk) = VFB(X~) , II denotes the or- thogonal projection operator on the convex set K,, and step multipliers pk satisfy condition ( 12). The vcctors [s(xk) can be called stochastic mollifiers gradients, as they were in Sections 3.2 and 4.2.

The convergence of the stochastic gradient method with projection on a convex compact set for a smooth nonconvex objective function Fo has been studied by Dorofeev [6] .

5. Concluding remarks

The analysis of nonsmooth stochastic problems shows the importance of random search methods in directly confronting their inherent complexity.

A promising direction seems to be the use of stochastic mollifier gradients (see Sections 3.2 and 4.2). This concept incorporates two fundamental approaches to the differentiation of 'nonclassical' functions: the theory of distributions (theory of gen- eralized functions) and nonsmooth analysis. Random search procedures take a global view on the 'land- scape' of performance functions, enabling the user to bypass some local solutions and discontinuities. Numerical experiments with realistic discontinuous problems (see Oortmarssen and Ermoliev [28] ) in- dicate fast convergence to an important neighborhood

242 Yu.M. Ermoliev, YI. Norkin/European Journal of Operational Research 101 (1997) 230-244

of optimal solutions (see also Ermoliev and Gaivo- ronski [7] ).

Of course, there are still more questions than an- swers. For example, optimality conditions as proposed in Section 4.3 must be elaborated and appropriatecom- putational procedures must be developed. In this pa- per we primarily discussed the applicability of gen- eral purpose stochastic mollifier procedures that can be used for any nonsmooth problem. The study of spe- cific classes of problems and the choice of the most suitable classes of mollifiers are importent tasks for the future. For example, the lack of an analytical struc- ture is a common fault in the problems in Section 2. By choosing the appropriate family of mollifiers go it is possible to replace the unknown density p ( . ) by the well-defined (.) (see Eq. (4) ) . We hope to be able to answer some of these questions in the near future.

Appendix A

Proof of Proposition 4.3. The lower semicontinuity of F follows from Fatou's lemma. The convergence of F (xk ) to F (x ) for xk + X, xk E D(x) , follows from Lebesgue's dominant convergence theorem. Hence F is directionally continuous at x in the direction D(x) . Here we show that, in any open set A c Rn that is close to x, there are points of continuity of F. For the case when w takes a finite number of values wl , . . . , on,, with probabilities p l , . . . ,pn,, the function F ( . ) = Czl pi f (., w;) is clearly piecewise continuous. For the case when w takes a countable number of values there is a sequence of closed balls B; C: Bi-, c A converging to some point y E A with f (., wi) contin- uous on Bi. We show that F ( . ) = Cz pi f (., wi) is continuous at y. By assumption If (x, mi) 1 < Ci for x E A and Cz pici < +m. We have

and

Then for any xk + y ,

Since Crrn+, 2piCi -+ 0 as rn + CG it follows that limk F (xk ) = F(y) . 0

Proof of Theorem 4.1. Note that infKe Fk rnonoton- ously increases as E 1 0; hence the same holds for liminfk,, infKe Fk and lirn supk,, infKe F ~ . Thus limits over E 1 0 in (20) exist.

Let us take an arbitrary sequence en, 0, indices k;,, and points xi, E K,,,, such that for fixed in,

liminf(infFk) = lirn (inf FkL) = lirn Fk"(x,",,). k Kt", s-+m

K%! s-00

Thus

lim[lim sup(inf F k ) ] 2 lim[ (lim inf(inf Fk ) ] €10 k K, '-10 k K,

for some indices s,,. By property ( i ) of epi- convergence, limn,,, ~~z ( ~ 2 ) 2 infK E Hence

lim[lim sup(inf F k ) ] 2 lim[liminf(inf F ~ ) ] 2 inf E €10 k . K, €10 k Kc K

Let us prove the opposite inequality. Since F is lower semicontinuous, it follows that F (x ) = infK F for some x E K. By condition (ii) of epi-convergence, there exists sequence xk -+ x such that Fk(xk) +

F(x) . For k sufficiently large, xk E K,; hence infKc Fk < Fk(xk) and

6 lim[lim sup(inf F k ) ] < F(x ) = inf F: €10 k K, K

The proof of Eq. (20) is complete. We now prove Eq. (21). Let xt E K, and Fk(x t ) <

infKe Fk + ak, ak 1 0. Denote X, = lirn sup, xf 5 K,. L e t ~ ~ , ~ O , x , ~ , E X , ~ , a n d x , ~ , + x E K a s r n + ~ ~ .

Yu.M. Ermoliev, YI. Norkin/European Journal of Operarional Research 101 (1997) 230-244 243

By construction of X,, for each fixed m, there exist sequences xSit -+ x,., satisfying

F(x,,) < lirn inf Fk;: (x$) < lim inf(inf Fk:) S s KC",

< lim sup( inf Fk ) . k KC",

Due to the lower semicontinuity of F and Eq. (20) we obtain

F (x ) < liminf F ( X , ~ ) n1--too

< lim inf[lim sup(inf F ~ ) ] = inf E E m 10 k K

Hence x E argmin,F, which proves Eq. (21 ).

Proof of Theorem 4.3. Let x be a local minimizer of F on K. For a sufficiently small compact neighborhood V of x, define

The function 4 achieves its global minimum on (K n V) at x. Consider also the averaged functions

where

Ermoliev et al. [9] show that (i) the functions 4k are continuously differentiable, (ii) they epi-converge to q!~ relative to K n V, and (iii) their global minima z k on K n V converge to x as k + m. For sufficiently large k, the following necessary optimality condition is satisfied:

1f VFk-(zknl) = 0 for some {zkm -+ x), then also 0 E - d*F(x) a n d G ~ N ~ ( x ) . ~ f V F ~ ~ ( z ~ ~ ~ ) - g # Ofor some { z k n l -+ x), then

and

If limsupk l lvFk(zk) 1 1 = +m, then for some { z k n , - x),

and (g , + a ) E Z*F(X), (-g, + a ) E NK(x).

References

[ I ] F. Archetti, B. Betrb, Convex programming via stochastic regularization, Quademi del Dipartimento di Ricerca Operativa e Scienze Statistiche, N 17, Universitj. di Pisa, 1975.

[2] B.D. Batuhtin, L.A. Maiboroda, Optimization of Disconti- nuous Functions, Nauka, Moscow, 1984 [in Russian].

[ 3 ] H. Brooks, The topology of surprises in technology, institutions and development, in: W.C. Clark, R.E. Munn (Eds.), Sustainable Development of the Biosphere, Cambridge University Press, Cambridge, UK, 1985.

[4] EH. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983.

[5] W.C. Clark, On the practical implication of the carbon dioxide question, International Institute for Applied Systems Analysis, Laxenburg, Austria, 1985.

[6] P.A. Dorofeev, A scheme of iterative minimization methods, U.S.S.R. Computational Mathematics and Mathematical Physics 26 (2) (1986) 131-136 [in Russian].

[7 ] Yu.M. Ermoliev, A. Gaivoronski, Stochastic programming techniques for optimization of discrete event systems, Annals of Operations Research 39 ( 1992) 120-135.

[8] Yu.M. Ermoliev, V.I. Norkin, On nonsmooth problems of stochastic systems optimization, WP-95-96, International Institute for Applied System Analysis, Laxenburg, Austria, 1995.

[9] Yu.M. Ermoliev, V.1. Norkin, R.J-B. Wets, The minimization of semi-continuous functions: mollifier subgradients, SIAM Journal on Control and Optimization 1 (1995) 149-167.

[ 101 Yu.M. Ermoliev, E.A. Nurminski, Limit extremal problems, Kibernetika 4 (1973) 130-132 [in Russian].

[ 11 ] Yu.M. Ermoliev, R.J-B. Wets (Eds.) Numerical Techniques for Stochastic Optimization, Springer-Verlag. Berlin, 1988.

Yu.M. Ermoliev, VI . Norkin/European Journal of Operational Research 101 (1997) 230-244

Yu.M. Ermoliev, S. Uryas'ev, J. Wessels, On optimization of dynamical material flow systems using simulation, WP- 92-76, International Institute for Applied System Analysis, Laxenburg, Austria, 1992. P. Glasseman, Gradient Estimation via Perturbation Analysis, Kluwer, Nonvell, MA, 1991.

1141 A.M. Gupal, On a method for the minimization of almost differentiable functions, Kibernetika 1 ( 1977) 114-1 16 [in Russian; English translation in: Cybernetics 13 ( 1 ) ( 1977) 115-1171.

[ IS] A.M. Gupal, Stochastic Methods for Solving Nonsmooth Extremal Problems, Naukova Durnka, Kiev, 1979 [in Russian].

[ 161 A.M. Gupal, V.I. Norkin, An algorithm for minimization of discontinuous functions, Kibernetika 2 (1977) 73-75 [in Russian; English translation in: Cybernetics 13 (2) (1977) 220-2221,

[I71 Y.C. Ho, X.R. Cao, Discrete Event Dynamic Systems and Perturbation Analysis, Kluwer, Nonvell, MA, 199 1.

1181 Y.C. Ho, Heuristics, rules of thumb, and the 80120 proposition, IEEE Transactions on Automatic Control 39 (5) ( 1994) 1025-1027.

[I91 C.S. Holling, Resistance of ecosystems: local surprise and global change, in: W.C. Clark, R.E.Munn (Eds.), Sustainable Development of the Biosphere, Cambridge University Press, Cambridge, UK, 1985.

1201 V.Ya. Katkovnik, Linear Estimates and Stochastic Opti- mization Problems, Nauka, Moscow, 1976 [in Russian].

[21] V.Ya. Katkovnik, Yu. Kulchitsky, Convergence of a class of random search algorithms, Automation and Remote Control 8 (1972) 1321-1326 [in Russian].

[22] J. Kreimer, R.Y. Rubinstein, Nondifferentiable optimization via smooth approximation: general analytical approach, Annals of Operations Research 39 (1992) 97-1 19.

1231 N.K. Krivulin, Optimization of dynamic discrete event systems through simulations, Candidate Dissertation, Leningrad University, 1990 [in Russian].

[24] D.Q. Mayne, E. Polak, Nondifferentiable optimization via adaptive smoothing, Journal of Optimization Theory and Applications 43 ( 1984) 601-613.

[25] V.S. Mikhalevich, A.M. Gupal, V.I. Norkin, Methods of Nonconvex Optimization, Nauka, Moscow, 1987 [in Russian].

[26] F. Mirzoahmedov, The queuing system optimization problem and a numerical method for its solution, Kibernetika 3 (1990) 73-75 [in Russian; English translation in: Cybernetics 26 (3) ( 1990) 405-408 1.

[27] V.I. Norkin, On nonlocal algorithms for optimization of nonsmooth functions, Kibemetika 5 (1978) 75-79 [in Russian; English translation in: Cybernetics 14 (5) (1978) 704-7071.

[28] G. Oortmarssen, Yu.M. Ermoliev, Stochastic optimization of screening strategies for preventing irreversible changes, WP- 94.124, International Institute for Applied Systems Analysis, Laxenburg, Austria, 1994.

[29] M.L. Pany, Climate Change, Agriculture and Settlement, Dawson, Folkestone, UK, 1978.

1301 R.T. Rockafellar, R.J-B. Wets, Variational systems, an introduction, in: G. Salinetti (Ed.), Multifunctions and Integmnds, Lecture Notes in Mathematics, vol. 109 1, Springer-Verlag, Berlin, 1984.

[31] R.T. Rockafellar, R J B . Wets, Cosmic convergence, in: - A. Ioffe, M. Marcus, S. Reich (Eds.), Optimization and Nonlinear Analysis, Pitman Research Notes in Mathematics Series 244, Longman Scientific & Technical, Essex, UK, 1991, pp. 249-272.

[32] R.Y. Rubinstein, A. Shapiro, The Optimization of Discrete Event Dynamic Systems by the Score Function Method, Wiley, New York, 1993.

1331 R. Sun, Perturbation analysis: the state of the art and research issues explained via the GI /G/I queue, Proceedings of the IEEE 77 ( 1 ) (1989) 114-137.

[34] D.B. Yudin, Qualitative methods for analysis of complex systems I, Izvestiya Akademii Nauka SSSR. Tekhnicheskaya Kibernetika 1 ( 1965) 3-14 [in Rqssianl.

[ 351 T.M.L. Wigley, Impact of extreme events, Nature 386 ( 1985) 106-107.