An unconstrained minimization approach to the solution of optimization problems with simple bounds

AN UNCONSTRAINED MINIMIZATION APPROACHTO THE SOLUTION OFOPTIMIZATION PROBLEMS WITH SIMPLE BOUNDSFrancisco Facchinei and Stefano LucidiUniversit�a di Roma \La Sapienza"Dipartimento di Informatica e SistemisticaVia Buonarroti 12, 00185 Roma, Italye-mail (Facchinei): [email protected] (Lucidi): [email protected]

Abstract: A new method for the solution of minimization problems with simple boundsis presented. Global convergence of a general scheme requiring the solution of a singlelinear system at each iteration is proved and a superlinear convergence rate is establishedwithout requiring the strict complementarity assumption. The theory presented coversNewton and Quasi-Newton methods, allows rapid changes in the active set estimate andis based on a smooth unconstrained reformulation of the bound constrained problem.Key Words: Bound constrained problem, penalty function, Newton method, nonmono-tone line search, strict complementarity.

2 F. Facchinei and S. Lucidi1 IntroductionWe are concerned with the solution of simple bound constrained minimization problems of theform min f(x); s:t: l � x � u; (PB)where the objective function f is smooth, l and u are constant vectors, and the inequalities arevalid componentwise. This paper is the continuation of [13], where a wide class of di�erentiableexact penalty functions for Problem (PB) was introduced and studied in detail. On the basisof these penalty functions it is possible to de�ne several algorithm schemes for the solutionof Problem (PB). This paper is devoted to the detailed study of one of these schemes, which,from the results reported in [13] appears to be promising. Penalty functions (both di�erentiableand nondi�erentiable ones) for the solution of quadratic, box constrained problems attractedmuch attention in the last few years and showed to be a powerful tool that can lead to e�cientalgorithms [6, 20, 21, 23, 24, 25, 26]. This work can be seen as an attempt to extend these kindof results to nonquadratic problems.Box constrained problems arise often in the applications, and some authors even claim thatany real-world unconstrained optimization problem is meaningful only if solved subject to boxconstraints. These facts have motivated considerable research devoted to the development ofe�cient and reliable solution algorithms, especially in the quadratic case. The development ofsuch algorithms is a challenging task; in fact, on the one hand the appealing structure of theconstraints urges the researchers to try to develop ad hoc minimization techniques that takefull advantage of this structure; on the other hand Problem (PB) still retains the main di�cultygenerally associated with inequality constrained problems: the determination of the set of activeconstraints at the solution.The algorithms most widely used to solve Problem (PB) fall in the active set category. Inthis class of methods at each iteration we have a working set that is supposed to approximatethe set of active constraints at the solution and that is iteratively updated. In general, only asingle active constraint can be added or deleted to the active set at each iteration, and this canunnecessarily slow down the convergence rate, especially when dealing with large-scale problems.Note, however, that, for the special case of Problem (PB), it is possible to envisage algorithmsthat update the working set more e�ciently [17], especially in the quadratic case [9].Actually, a number of proposals have been made in the last years to design algorithms thatquickly identify the correct active set. With regard to Problem (PB), the seminal work is [2](see also [1]), where it is shown that if the strict complementarity assumption holds, then it ispossible, using a projection method, to add or delete to the current estimated active set manyconstraints at each iteration and yet �nd an active set in a �nite number of steps. This workhas motivated a lot of further studies on projection techniques, both for the general linearlyconstrained case and for the box constrained case (see , e.g. [3], [4], [5], and [10]), and it is safeto say that algorithms in this class are among the most e�cient ones for the solution of largescale, convex, quadratic problems, [29], [30].More recently trust region type algorithms for unconstrained optimization have been suc-cesfully extended to handle the presence of bounds on the variables. The global convergencetheory thus developed is very robust [7],[16] and, under appropriate assumptions, it is possibleto establish a superlinear convergence rate without requiring strict complementarity [22],[16].Furthermore, preliminary numerical results on small, dense problems [8],[16] show that thesemethods are e�ective and suggest that they are well suited to large-scale problems. Anotheralgorithm also based on a trust region philosophy, but in connection with a nonsmooth meritfunction, is proposed in [34]. A major di�erence between this latter algorithm and the techniquesso far considered is that the iterates generated are not forced to remain feasible throughout.

An algorithm for bound constrained problems 3We �nally mention that interior point methods for the solution of Problem (PB) are currentlyan active �eld of research and that some interesting theoretical results can be obtained in thisframework, yet computational experience with this class of methods is still very limited (see,however, [31]).In this paper we propose a new general scheme for the solution of Problem (PB) which doesnot �t in any of the categories considered above. At each iteration a linear system is solved tocompute a search direction. This linear system, whose de�nition is based on a powerful activeset identi�cation technique, can be viewed as the system arising from the application of theNewton method to the solution of the Kuhn-Tucker conditions for Problem (PB). To globalizethis Newton-type algorithm it is possible to employ a line search nonmonotone stabilizationtechnique ([19]) in conjunction with a simple continuously di�erentiable exact penalty functionfor Problem (PB) whose properties were studied in [13]. Di�erentiable penalty functions areoften blamed for being too computationally expensive, however the one we employ in this papertakes full advantage of the structure of Problem (PB) and is not expensive. Furthermore, thanksto the nonmonotone stabilization technique, we have to resort to the penalty function aid veryseldom, and in most iterations we do not even compute its value.It is worth mentioning the following points.(a) A complete global convergence theory is established for the proposed general scheme whichcovers Newton and Quasi-Newton algorithms.(b) It is shown that this general scheme does not prevent superlinear convergence, in the sensethat if a step length of one along the search direction yields superlinear convergence then,without requiring strict complementarity, the step length of one is eventually accepted.(c) Rapid changes in the active set estimate are allowed.(d) The points generated by the algorithms at each iteration need not be feasible.(e) The main computational burden per iteration is given by the solution of a square linearsystem whose dimension is equal to the number of variables estimated to be non active.(f) A particular Newton-type algorithm is described which falls in the general scheme of point(a) and for which it is possible to establish, under the strong second order su�cientcondition, but without requiring strict complementarity, a superlinear convergence rate.Numerical results obtained with the algorithm of point (f), were reported in [13], where itwas shown that our approach is viable in practice, at least for small-to-medium-size problems.We are currently investigating truncated versions of this algorithm with the aim of tacklinglarge scale problems as well; we shall report on this topic in a future paper. Regarding point(d) we note that the possible infeasibility of the points generated may constitute a limitation ifthe bounds are \hard", but often this is not the case, and the possibility to violate some of theconstraints may give additional, bene�cial exibility.The paper is organized as follows. In the next section some basic de�nitions and assumptionsare stated. In Section 3 the main properties of a di�erentiable merit function for problem (PB)are recalled. Sections 4, 5, and 6 contain a detailed exposition of the algorithm and an analysisof its main properties (some relevant lengthy proofs which are an extension of known results forunconstrained minimization problems are collected in the Appendix). Conclusions and directionsfor future research are outlined in Section 7.Regarding the notation , if M is a n�n matrix with rowsMi, i = 1; : : : ; n, and if I and J areindex sets such that I , J � f1; : : : ; ng, we denote by MI the jI j�n submatrix of M consisting of

4 F. Facchinei and S. Lucidirows Mi, i 2 I , and we denote by MI;J the jI j� jJ j submatrix of M consisting of elements Mi;j ,i 2 I , j 2 J . If w is an n vector, we denote by wI the subvector with components wi, i 2 I , andwe denote by Diag[wi] the n� n diagonal matrix with diagonal elements wi. A superscript k isused to indicate iteration numbers; furthermore, we shall often omit the arguments and write,for example, fk instead of f(xk). Finally we indicate by E the n�n identity matrix and by k � kthe euclidean norm.2 Problem formulation and preliminariesFor convenience we recall Problem (PB)min f(x); s:t: l � x � u: (PB)For simplicity we assume that the objective function f : IRn ! IR is three times continuouslydi�erentiable (even if weaker assumptions can be used, see Remark 6.1) and that li < ui forevery i = 1; : : : ; n. Note that �1 and +1 are admitted values for li and ui respectively, i.e.we also consider the case in which some (possibly all) bounds are not present. In the sequel weindicate by F the feasible set of Problem (PB), that is:F := fx 2 Rn : l � x � ug: (1)Let � 2 IRn and � 2 IRn be two �xed vectors of positive constants and let xa and xbbe two feasible points such that f(xa) 6= f(xb). Without loss of generality we assume thatf(xa) < f(xb). The algorithms proposed in this paper generates, starting from xa, a sequenceof points which belong to the following open setS := fx 2 IRn : l � � < x < u+ �; f(x) < f(xb)g:Roughly speaking xa is the starting point, while xb determines the maximum function valuewhich can be taken by the objective function in the points generated by the algorithm. Weremark that not every point produced by the algorithm we propose is feasible; feasibility is onlyensured in the limit. Note also that � and � are arbitrarly �xed before starting the algorithmand never changed during the minimization process.We now introduce an assumption that is needed to guarantee that no unbounded sequencesare produced by the minimization process. This hypothesis has the same role which the com-pactness assumption on the level sets of the objective function has in the unconstrained case.We note that this assumption (or a similar one) is needed by any standard algorithm whichguarantees the existence of a limit point.Assumption 1 The set S is bounded.Assumption 1 is automatically satis�ed in the following cases:- all the variables have �nite lower and upper bounds- f(x) is radially unbounded, that is limkxk!1 f(x) = +1.Observe also that in the unconstrained case Assumption A is equivalent to the standard com-pactness hypothesis on the level sets of the objective function.In the sequel of this paper we shall consider in detail the results only for the case in whichall the variables are box constrained, i.e. the case in which no li is �1 and no ui is +1. The

An algorithm for bound constrained problems 5extension to the general case is trivial and, therefore, we omit it. With this assumption, theKKT conditions for �x to solve Problem (PB) arerf(�x)� ��+ �� = 0;(l� �x)0�� = 0;(�x� u)0�� = 0;�� 0; �� 0;l � �x � u; (2)where �� 2 IRn and �� 2 IRn are the KKT multipliers. Strict complementarity is said to hold atthe KKT triplet (�x; ��; ��) if �xi = li implies ��i > 0 and �xi = ui implies ��i > 0.An equivalent way to write the KKT conditions is the following onel � �x � u;li < �xi < ui =) rf(�x)i = 0;�xi = li =) rf(�x)i � 0;�xi = ui =) rf(�x)i � 0: (3)In this case the strict complementarity assumption corresponds to having rf(�x)i > 0 andrf(�x)i < 0 in the second and third implications of (3).It is also possible to give second order su�cient conditions of optimality for Problem (PB).The most common is the KKT second order su�cient condition, see e.g. [1]. However, in order toprove a superlinear convergence rate without assuming strict complementarity, we shall employa stronger condition, known as the strong second order su�cient condition. This condition hasalready been employed, with the same purpose, in [22] (see also [33]).Condition 1 Let (�x; ��; ��) be a KKT triplet for Problem (PB). We say that the strong secondorder su�cient condition holds at �x ifz0r2f(�x)z > 0; 8z 2 fz 2 IRn : zi = 0; 8i : ��i > 0 or ��i > 0g:We note that the strong second order condition boils down to the KKT second order condi-tions if the strict complementarity assumption holds. In general, however, condition 1 is strongerthan the KKT second order su�cient condition in that it requires the de�nite positiveness ofthe Hessian of the objective function on a larger region.3 A di�erentiable exact penalty function for Problem(PB)In this section we introduce a di�erentiable exact penalty function for Problem (PB). The penaltyfunctions belongs to a more general class of penalty function studied in [13]. Here we reportonly some very basic facts on this penalty function, the interested reader is referred to [13] fora more complete discussion.The the penalty function is given byP (x; ") := f(x) +Pni=1��i(x)ri(x; ")+1" ri(x;")2c(x)ai(x)�+Pni=1��i(x)si(x; ") + 1" si(x;")2c(x)bi(x)�; (4)

6 F. Facchinei and S. Lucidiwhere:ri(x; ") := max �li � xi;�"2c(x)ai(x)�i(x)� ; si(x; ") := max �xi � ui;�"2c(x)bi(x)�i(x)� ; (5)and where, ai, bi and ci are barrier functions, see [13]ai(x) := �i � li + xi; bi(x) := �i + ui � xi; c(x) := f(xb)� f(x); (6)while �i(x) and �i(x) are, according to the de�nition given in [15], multiplier functions�i(x) = (xi � ui)2(xi � ui)2 + (li � xi)2 rfi(x) 8i = 1; : : : ; n: (7)�i(x) = � (li � xi)2(xi � ui)2 + (li � xi)2 rfi(x); 8i = 1; : : : ; n; : (8)The multiplier functions are obviously continuously di�erentiable and it is trivial to verify, see[13], that if (�x; ��; ��) is a Kuhn-Tucker triplet for Problem (PB) then �(�x) = ��, and �(�x) = ��.The penalty function depends, as usual, on a positive parameter "; furthermore it is de�nedonly in S (see Section 2). We note that, on the boundary of S, at least one of the terms ai(x),bi(x) and c(x) goes to zero, and this causes the level sets of penalty function to be compact.In particular, this implies that a minimization algorithm applied to the penalty function willnever generate unbounded sequences. A detailed study of the properties of P (x; ") can be foundin [13]. It can be proved that, for su�ciently small values of the penalty parameter ", thereis a one-to-one correspondence between (unconstrained) stationary and minimum points of thepenalty function on S and stationary and minimum points of Problem (PB). Hence, we can solvethe Problem (PB) by performing a single, unconstrained minimization of P (x; "), provided that" is small enough. >From this point of view another important feature of the penalty functionP (x; ") is that, in spite of the terms (5), it is continuously di�erentiable in S, so that standard,e�cient methods for unconstrained smooth minimization can be employed. The gradient ofP (x; ") is given by (see [13]):rP (x; ")= � 1"c(x)Diag � 1ai(x)�Diag �2 + ri(x; ")ai(x) � r(x; ")+ 1"c(x)Diag � 1bi(x)�Diag �2 + si(x; ")bi(x) � s(x; ")+r�(x)r(x; ") + 1"c(x)2 �r(x; ")0Diag � 1ai(x)� r(x; ")�rf(x)+r�(x)s(x; ") + 1"c(x)2 �s(x; ")0Diag � 1bi(x)� s(x; ")�rf(x): (9)We remark that the terms of the gradient have been rearranged using the expression of themultiplier functions, so that the expression of the gradient of f(x) does not appear explicitly inthe above formula.We �nally report the following technical result, that will be used in the sequel and thatfollows from [13, Proposition 2.3].Proposition 3.1 For every " > 0 the set fx 2 S : P (x; ") � P (xa; ")g is closed. Furthermore,there exists a compact set L such that, for every " > 0 we havefx 2 S : P (x; ") � P (xa; ")g � L � S:

An algorithm for bound constrained problems 7The properties of the penalty function reported so far clearly show that we can solve Problem(PB) by unconstrained optimization techniques. However, if one wants to develope a practicalalgorithm at least two important questions have two be answered. How to calculate a suitablevalue of ", so that, as discussed above, the unconstrained minimization of the penalty functionis equivalent to the solution of Problem (PB); and which unconstrained optimization algorithmto employ for the minimization of the penalty function. In [13] we have proposed a very generalscheme for updating " that, coupled with practically any standard unconstrained minimizationalgorithm allows us to solve Problem (PB). This scheme has however a drawback in that it doesnot exploit the structure neither of Problem (PB) nor of the minimization algorithm employed.The aim of this paper is to present a new algorithm based on P (x; ") which answers in anovel and innovative way to the two questions raised above. In particular the unconstrainedminimization algorithm we use is a nonmonotone line-search scheme which uses a search directionthat is strongly related to the KKT conditions of Problem (PB). Since this algorithm is so muchtailored to the structure of the problem, we can use a rule for updating the penalty parameterdi�erent from that proposed in a much broader context in [13] and which, for the problem athand, is much more e�cient from a practical point of view.To facilitate the reader, we split the presentation of the new scheme into three parts. In thenext section we introduce a general algorithm model for the minimization of P (x; ") which isbased on the one proposed in [19]. We show that, for every �xed value of the penalty parameter", this algorithm is globally convergent to solutions of Problem (PB), provided that the searchdirection satis�es suitable, nonstandard assumptions. In Section 5 we show how it is possible,by considering an approximation of the KKT conditions, to compute cheaply and e�ciently asearch direction which, for su�ciently small values of the penalty parameter ", satis�es all theassumptions required for the global convergence of the nonmonotone algorithm of Section 4.Finally, in Section 6 we describe the overall algorithm, which includes an automatic updatingscheme for the penalty parameter " and is based on the results of the previous two sections.We also study the local convergence properties of the algorithm and show that, under mildassumptions, it is quadratically convergent to solutions of Problem (PB).4 A nonmonotone algorithm for the minimization ofP (x; ")In this section we introduced a nonmonotone algorithm for the minimization of P (x; ") for a �xedvalue of the penalty parameter ". We show that, if the search direction employed satis�es certainnonstandard conditions, then every limit point of the sequence produced is both a stationarypoint of P (x; ") and a solution of Problem (PB). In the next two sections we shall show how adirection d satisying these nonstandard assumptions can be computed.The algorithm we consider is an iterative process of the formxk+1 = xk + �kdk; (10)where x0 = xa 2 F is the starting point, dk is the search direction and �k is the stepsize.In order to establish the convergence properties of the algorithm, we assume that the fol-lowing assumption on the direction dk is always satis�ed.Assumption 2 The search direction dk 2 IRn satis�es the following conditions:(a) dk = 0 if and only if xk is a stationary point of Problem (PB);

8 F. Facchinei and S. Lucidi(b) if xk ! �x and dk ! 0, then �x is a stationary point of Problem (PB).Furthermore, in certain speci�c iterations (see below), we shall also assume that the followingcondition is ful�lled by the direction.Assumption 3 There exists a positive number such that the search direction dk 2 IRn satis�esthe condition rP (xk ; ")Tdk � � kdkk2:Using these assumptions we can now introduce a general algorithm model for the solution ofProblem (PB) which is strongly based on the NonMonotone stabilization algorithm proposed in[19] and which includes as particular cases, many known linesearch algorithms. The algorithmmodel is an iterative process of the form (10) that includes di�erent strategies for enforcingglobal convergence without requiring a monotonic reduction of the merit function. This may bereasonable in many situations. For example, in our case, if the sequence fkdkkg goes to zero,then, by Assumption 2 (b), the corresponding sequence of points fxkg is converging to stationarypoints. Then an e�ective criterion to control if convergence is taking place is to check whetherthe norm of the direction is decreasing. Thus the \normal step" of the algorithm is to checkwhether the norm of the direction has \su�ciently" decreased. If it has, the algorithm acceptsthe unit step size (�k = 1) without computing the merit function. Otherwise, after a check onthe objective function value (and a possible \backtrack", see below), the algorithm performs anonmonotone Armijo-type linesearch procedure [18]. In order to prevent the sequence of pointsfrom leaving the region of interest (with possible convergence to local maxima or occurence ofover ows) a \function step" is performed at least every N � 1 iterations. In a \function step"the objective function is computed and its value is compared with an adjustable reference value(R). If the value of the objective function is smaller than the reference value the algorithmproceeds like in a normal step, as above. Otherwise the algorithm \backtracks" by restoring thevector of variables to the last point where the objective function was smaller than the referencevalue R.The linesearch procedure islinesearch: If necessary modify dk so that Assumption 3 is satis�ed.Find the smallest integer from i = 0; 1; : : : such thatxk + 2�idk 2 S (11)P (xk + 2�idk; ") � R� �2�irP (xk; ")Tdk; (12)set �k = 2�i, ` = k + 1 and update R.Note that this linesearch procedure is invoked only in certain steps and that only in thissteps Assumption 3 is required to hold.In the description of the algorithm that follows, ` denotes the iteration index where themerit function has been evaluated and the reference value modi�ed. The precise form of thebacktraking procedure then can be described as followsbacktrack: Replace xk by x` and set k = `.

An algorithm for bound constrained problems 9Actually in describing the algorithm we have simpli�ed matters in order to concentrate on themain arguments. In fact when we try to accept the unit stepsize without computing the meritfunction value (normal step), we have to chech whether xk+dk 2 S, because the merit function isde�ned only in the open set S and we cannot generate points outside this region. Hence we haveto possibly \scale" dk in order to ensure the condition xk + dk 2 S. Note that this procedure isstandard and common to all the modi�cations of unconstrained minimization algorithm designedto locate stationary points of an objective function on an open set (see, e.g., [27]). The scalingprocedure isscale: Find the smallest integer from i = 0; 1; : : : such thatxk + 2�idk 2 S (13)and set dk = 2�idk.We can now describe the algorithm more in detail.NonMonotone Stabilization Algorithm for Box Constrained Problems (NMSB)Data: Choose x0 = xa 2 F , " > 0, �0 � 0, � 2 (0; 1), � 2 (0; 1=2) and N � 1.Initialization: Set k = 0, j = 0, `(j) = 0, and � = �0.Compute P (x0; ") and set Rj = P (x0; ").Iteration: Compute dk satisfying Assumption 2. If kdkk = 0 stop.If k 6= `+N perform an n-step to calculate �k ;otherwise perform an f-step to calculate �k.Set xk+1 = xk + �kdk, k = k + 1, and repeat Iteration.n-step: If kdkk � � perform (a), otherwise perform (b).(a) Perform scale and set � = ��.(b) Compute P (xk ; ")If P (xk; ") � Rj perform backtrack and linesearch;otherwise set `(j) = k, perform update Rj , set j=j+1and perform linesearch.f-step: Compute P (xk; ").If P (xk ; ") � Rj perform (c); otherwise perform (d).(c) Perform backtrack and linesearch;(d) Set `(j) = k, perform update Rj and set j = j + 1.If kdkk � � perform scale and set � = ��;otherwise perform linesearch.In order to complete the description of the algorithm we only need to specify the way in whichthe reference value Rj is updated. To this end we note that the index j is incremented eachtime we set `(j) = k, i.e. each time the function is evaluated. Therefore fx`(j)g is the sequenceof points where the merit function is evaluated and fRjg is the sequence of reference values.The reference value is initially set to P (x0; "). Whenever a point x`(j) is generated such thatP (x`(j); ") < Rj , the reference value is updated by tacking into account the \memory" (i.e.a �xed number m(j) � �m of previous values ) of the objective function. To be precise, theupdating rule for Rj is the following one.

10 F. Facchinei and S. LucidiUpdate Rj: Given �m � 0, let m(j + 1) be such thatm(j + 1) � min[m(j) + 1; �m]:Choose the value Rj+1 to satisfyP (x`(j+1); ") � Rj+1 � max0�i�m(j+1)P (x`(j+1�i); "): (14)Note that if, in the procedure Update Rj , �m = 0, then each time we update Rj, we simplyset it to the current penalty function value, so that we perform monotone linesearches. On theother hand, if �m > 0, then it is possible to choose a value of Rj which is larger than the currentvalue of the penalty function, so that we perform nonmonotone linesearches and the penaltyfunction value can increase from one iteration to the next one. It turns out that nonmonotonelinesearches are a very valuable tool from the numerical point of view [18, 19].The NMSB algorithm is a very general scheme and encompasses many possible extensionsof unconstrained algorithms. For example, if we set �m = 0 and �0 = 0 we obtain the Armijostabilization algorithm; if we set �m > 0 and �0 = 0 we obtain the box constrained version ofthe nonmonotone algorithm proposed in [18]. By an almost verbatim repetition of the proof ofconvergence described in [19] it is easy to show that the following result holds.Theorem 4.1 Suppose that we generate a sequence fxkg according NMSB algorithm describedabove. Then:(i) there exists at least one limit point of the sequence fxkg;(ii) every limit point of the sequncece fxkg is a KKT of Problem (PB);(iii) every limit point �x of the sequncece fxkg is such that f(�x) � f(xa).The detailed proof of this theorem, which is just an adaptation of some of the proofs in [19], islong and cumbersome, and we therefore report it in the Appendix.In the statement of Theorem 4.1 we have stressed the properties of the algorithm in term ofproperties of Problem (PB). However we can equivalently see algorithm NMSB as an algorithmfor the minimization of the penalty function. From this point of view, we can also see thatevery accumulation point of the sequence generated by the algorithm is a stationary point ofthe penalty function. This easily follows by the fact every KKT point of Problem (PB) is astationary point of P (x; ") for every positive value of ", see [13].5 The search directionIn this section we show how to build a direction dk that satis�es Assumptions 2 and 3. Inparticular we de�ne a search direction which satis�es Assumption 2 for every value of ", whileAssumption 3 is ful�lled only if the penalty parameter " is su�ciently small.The calculation of dk is based on an identi�cation technique that uses the simple multiplierfunctions (7) and (8) and on the solution of KKT-like equations for Problem (PB).Based on the multiplier functions we can introduce the following \guessing" of the sets ofindexes active at their lower or upper bounds.L(x; ") := fi : xi � li + min["c(x)2 ai(x)�i(x); ui � li3 ]g (15)U(x; ") := fi : xi � ui �min["c(x)2 bi(x)�i(x); ui � li3 ]g; (16)

An algorithm for bound constrained problems 11where " is the positive parameter used in the penalty function and ai(x), bi(x), and c(x) are thebarriers functions de�ned by (6). Furthermore we indicate by N(x; ") the set of indices that areestimated to be non active. N(x; ") := f1; : : : ; ng n L(x; ")[ U(x; ") (17)Note that the sets L(x; "), U(x; ") and N(x; ") are pairwise disjoint for every x and for everypositive value of ".The following theorem shows that in fact these are good estimates, at least in the neighbor-hood of a KKT triplet of Problem (PB). The validity of this theorem immediately follows fromTheorem 2.1 and Remark 2.1 in [15].Theorem 5.1 Let (�x; ��; ��) be a K-T triplet for Problem (PB) and let �" be a positive constant.Then, for every " 2 (0; �"] there exists a neighborhood of �x such that, for all x 2 , we havefi : li < �xi < uig � N(x; ") � fi : ��i = 0 and ��i = 0gfi : ��i > 0g � L(x; ") � fi : li = �xigfi : ��i > 0g � U(x; ") � fi : ui = �xig: (18)Moreover, if the strict complementarity assumption holds, then, for all x 2 and " 2 (0; �"]N(x; ") = fi : li < �xi < uigL(x; ") = fi : li = �xigU(x; ") = fi : ui = �xig:The direction we shall use in our algorithm, then, is de�ned as the solution of the system264 HkNkELkEUk 375dk = �264 rfkNk(xk � l)Lk(xk � u)Uk 375 ; (19)where Lk, Uk and Nk are given by (15), (16), and (17) respectively and where Hk is an "ap-proximation" of r2fk satisfying the following assumption.Assumption 4 The matrices Hk are bounded and a positive � exists such that, for every k�kzk2 � z0HkNk ;Nkz; 8z 2 IRjNk j:By Theorem 5.1 (for every " > 0), there exists a neighborhood of a KKT point of Problem(PB) satisfying the strong second order condition, where the matrices Hk = r2f(xk) satisfyAssumption 4. Furthermore, Assumption 4 obviously guarantees that the direction dk is wellde�ned, i.e. that the system (19) is uniquely solvable.Note that the search direction dk depends on the current approximation to the solution,xk, on the matrix Hk and on the value of ", so that we should write d(xk; Hk; "). However,when there is no possibility of misunderstanding, we shall use the short notation dk and usethe full notation d(xk; Hk; ") only when we want to explicitly stress the dependence of dk on itsarguments or when there can be ambiguities.The next theorem shows that Assumption 2 (a) is always satis�ed.

12 F. Facchinei and S. LucidiTheorem 5.2 For every " > 0, for every xk belonging to S, and for every matrix H such thatHNk;Nk is positive de�nite, dk is equal to zero if and only if �x is stationary point for Problem(PB).Proof. Suppose that dk = 0. Then, since dk is the solution of system (19), and taking intoaccount the de�nition of the multiplier functions and of the index sets Lk, Nk and Uk , we havethat the following implications hold:i 2 Lk =) xki = li and rfi(xk) � 0;i 2 Nk =) li < xki < ui and rfi(xk) = 0;i 2 Uk =) xki = ui and rfi(xk) � 0,that is (3).Suppose now that xk is a stationary point for Problem (PB). Since xk is feasible, we have,by (18), (xk � l)Lk = 0; 8" > 0(xk � u)Uk = 0; 8" > 0:Furthermore by the �rst equation of (2) and (18) we have thatrf(xk)Nk = 0; 8" > 0:Hence the right hand side of system (19) is 0, and the theorem follows by noting that, by theassumption made on H , system (19) is non singular. /We now consider Assumption 2 (b).Theorem 5.3 Let fxkg be a sequence of points such that xk 2 S and such that fd(xk; Hk; ")g !0. Then, for every positive value of ", every accumulation point �x of fxkg is a KKT point.Proof. We can write the solution of system (19) in the following formdkNk = �(HkNk;Nk)�1[rfkNk +HkNk;LkdkLk +HkNk;UkdkUk ]dkLk = �(xk � l)Lk (20)dkUk = �(xk � u)Uk :Taking into account that the number of subsets of f1; : : : ; ng is �nite, we can assume, withoutloss of generality, that the index sets Lk, Nk, and Uk are constant, so that we can write:L(xk; ") = L; N(xk; ") = N; U(xk; ") = U:Passing to the limit in (20), and taking into account that by assumption fdkg ! 0 and Assump-tion 4, we get �xL = lL; �xU = uU ; (21)rfN(�x) = 0: (22)By (22), (8), (7) we have �N(�x) = 0; �N (�x) = 0:

An algorithm for bound constrained problems 13Then, by the de�nition of the index set N , we havelN � �xN � uN ; (23)so that, recalling (21) we conclude that �x is feasible.By the de�nition of L, (21), and (7), we haverfL(�x) � 0: (24)Analogously, by the de�nition of U , (21), and (8), we also have�rfU (�x) � 0: (25)Now, the theorem follows noting that (21)-(22) and (23)-(25) coincide with (3). /In the last theorem of this section we show that also Assumption (3) can be satis�ed. Butthere is a di�erence to be stressed. While it was possible to prove that Assumptions 2 (a)and (b) are satis�ed, by the search direction de�ned by (19), for every positive value of thepenalty parameter ", it is possible to satisfy Assumption 3 only for su�ciently small values ofthe penalty parameter ". On the other hand this fact is not unexpexted, since there is a completecorrespondence between the solutions of Problem (PB) and the unconstrained minimizers of thepenalty function only for su�ciently small values of ".Theorem 5.4 There exists an �" > 0 such that for every fxkg and f"g such that(i) " 2 (0; �"],(ii) xk 2 fx 2 S : P (xk; ") � P (xa; ")g ,the following relation holds:rP (xk; ")0d(xk; Hk; ") � � kd(xk; Hk; ")k2; (26)for some positive .Proof. The proof is by contradiction. Assume that the theorem is false, then there existsequences fxkg, f"kg, fHkg, and f kg such that"k # 0; k # 0; xk 2 fx 2 S : P (xk; "k) � P (xa; "k)g;rP (xk; "k)0dk > � kkdkk2: (27)Furthermore, analogously to the proof of Theorem 5.3, we shall assume, without loss of gener-ality, that the index sets Lk, Nk, and Uk are constant, namely:L(xk; "k) = L; N(xk; "k) = N; U(xk; "k) = U:By (9) we can write(rdk)0dk = � 1"kck (rk)0Diag "2 + rkiaki #Diag " 1aki # dk+ 1"kck (sk)0Diag"2 + skibki #Diag" 1bki # dk + (rk)0(r�k)0dk (28)+ 1"k(ck)2 (rk)0Diag " 1aki # rk(rfk)0dk + (sk)0(r�k)0dk+ 1"k(ck)2 (sk)0Diag" 1bki # sk(rfk)0dk

14 F. Facchinei and S. LucidiBy Assumption 1 and by (15), (16) and (19) we have, for "k small enough,i 2 L =) rki = li � xki = dki ; ski = �"k2 ckbki �ki ;i 2 N =) rki = �"k2 ckaki �ki ; ski = �"k2 ckbki �ki ;i 2 U =) rki = �"k2 ckaki �ki ; ski = xki � ui = �dki ; (29)so that we can rewrite (28) as(rdk)0dk = � 1"kck (dkL)0Diag "2 + li � xkiaki #LL (Diag" 1aki # dk)L+ 12(�kN)0Diag"2� "kck�ki2 #NN dkN + 12(�kU)0Diag "2� "kck�ki2 #UU dkU� 12(�kL)0Diag"2� "kck�ki2 #LL dkL � 12(�kN )0Diag"2� "kck�ki2 #NN dkN� 1"kck (dkU)0Diag "2 + ui � xkibki #UU (Diag" 1bki # dk)U + (dkL)0[(r�k)0dk]L (30)� "kck2 (Diag haki i�k)0N [(r�k)0dk]N � "kck2 (Diag haki i�)kU 0[(r�k)0dk]U� "kck2 (Diag hbki i�k)0L[(r�k)0dk]L � "kck2 (Diag hbki i�k)0N [(r�k)0dk]N� (dkU)0[(r�k)0dk]U + 1"k(ck)2 [(rk)0Diag " 1aki # rk + (sk)0Diag " 1bki # sk ](rfk)0dk:We now make the following readily veri�able observations.(i) Each element [2+ li � xkiaki ], i 2 L, and [2+ ui � xkibki ], i 2 U , is greater than 1. These elementsappear in some diagonal matrices in the formula above.(ii) Taking into account (19) we can write(rfk)N = �HkN;NdkN �HkN;LdkL �HkN;UdkU ;and hence we have, for i 2 N(�k)i = � (xk � u)2i(xk � u)2i + (l� xk)2i (HkN;NdkN +HkN;LdkL +HkN;UdkU)i(�k)i = (l� xk)2i(xk � u)2i + (l� xk)2i (HkN;NdkN +HkN;LdkL +HkN;UdkU)i(iii) By (8), (7) and (29) we can write, for i 2 L [ U(�k)i = (dk)2i(xk � u)2i + (l � xk)2i rf(xk)i(�k)i = � (dk)2i(xk � u)2i + (l� xk)2i rf(xk)i

An algorithm for bound constrained problems 15(iv) The quantities kxk � lk, kxk � uk, k�(xk)k, k�(xk)k, kr�(xk)k, kr�(xk)k and krf(xk)kare bounded.(v) By Assumption 4 and (19), since xk 2 S, the sequence kdkk is bounded.Then, taking into account (30) and the points (i)-(iv) above, we can assert that, for "k smallenough (rdk)0dk � � K1"kck kdkLk2 �K2kdkNk2 � K3"kck kdkUk2 +K4kdkLk2 +K5kdkUk2+K6kdkLkkdkNk+K7kdkNkkdkUk+K8kdkLkkdkUk+ 1"k(ck)2 [(dkL)0Diag" 1aki # dkL + (dkU)0Diag" 1bki # dkU ](rfk)0dk+"kK9kdkk2 (31)where K1; : : : ; K9 are positive constants. Equations (31) and (27) imply that for k su�cientlylarge 0 � kkdkk2 � K1"kck kdkLk2 �K2kdkNk2 � K3"kck kdkUk2 +K4kdkLk2+K5kdkUk2 +K6kdkLkkdkNk+K7kdkNkkdkUk+K8kdkLkkdkUk+ 1"k(ck)2 [(dkL)0Diag" 1aki # dkL + (dkU)0Diag" 1bki # dkU ](rfk)0dk + "kK9kdkk2 (32)= �(kdkLkkdkNk; kdkUk)Qk(kdkLk; kdkNk; kdkUk)0 + "kK9kdkk2| {z }�+ 1"k(ck)2 [(dkL)0Diag " 1aki # dkL + (dkU)0Diag " 1bki #dkU ](rfk)0dk| {z }�� ;where Qk is the matrix de�ned byQk = 0BBBB@ K1"kck �K4 � k �K62 �K72�K62 K2 �K82�K72 �K82 K3"kck �K5 � k 1CCCCA : (33)We want to show that, for "k small enough, Qk is a de�nite positive matrix with eigenvaluesuniformely bounded away from 0. To this end we note that we can write, for any k,Qk = 0BBBB@ K1"kck � K1~"~c �K4 � ( k � ~ ) 0 00 0 0�0 0 K3"kck � K3~"~c �K5 � ( k � ~ ) 1CCCCA (34)+ 0BBBB@ K1~"~c �K4 � ~ �K62 �K72�K62 K2 �K82�K72 �K82 K3~"~c �K5 � ~ 1CCCCA ;

16 F. Facchinei and S. Lucidiwhere ~", ~c and ~ are positive constants such that the second matrix in the right-hand side of(34) is a positive de�nite matrix (this is always possible, for ~", ~c and ~ small enough, as it canbe veri�ed by using Sylvester Theorem). Since "k and k are positive quantities tending to 0,and since ck is a positive quantity bounded from above, we have that, for k su�ciently large,the �rst matrix in the right-hand side of (34) is positive semide�nite, from which, the assertionon the uniform positive de�niteness of Qk readily follows.By Proposition 3.4 in [13], we can assume, without loss of generality, that xk ! �x 2 F \ S.This implies that ck is bounded away from 0 and, recalling (15), (16), (19) and "k # 0, also thatkdkLk ! 0 and kdkUk ! 0.Since by (v) kdkk is bounded, we can assume, without loss of generality, that dk admits alimit so that two cases can occur: (a) dk converges to 0, (b) dk converges to a vector di�erentfrom 0.(a) In case, term (**) can be majorized by the following expression:M0 M1kdkLk2" +M2 kdkUk2" ! j(rfk)0dkj; (35)where M0 � 1(ck)2 ; M1 � max1�i�n 1aki ; M2 � max1�i�n 1bki :Note that M0, M1, M2 satisfying the above relations exist because the respective right-handsides are bounded from above, by Proposition 3.1 and the de�nition of the set S. Now takinginto account (34), (35) and the fact that j(rfk)0dkj goes to 0, we easily see that eventually, in(32), the term (**) is dominated by the quadratic term de�ned by Qk. Since the same happensfor the term (*), because "k goes to 0, we have a contradiction from (32).(b) In this case we have that, as we already observed, dkL ! 0 and dkU ! 0, so that dkN ! ~dN 6= 0.We can write (rfk)0dk = (rfkL)0dkL + (rfkN)0dkN + (rfkU )0dkU :Then, using the observation (ii) above and recalling Assumption 4 we have that(rfk)0dk < 0;so that the term (**) is non positive. Since the quadratic term in (32) tends to a negativequantity and the term (*) tends to zero, again we have a contradiction from (32) and the proofis complete. /6 The algorithmAt this point of our analysis we have all the tools we need to describe our algorithm and toprove its properties.The results described in the previous section show that the direction dk given by (19) satis�esAssumption 2 for every positive ", while Assumption 3 is full�lled if " is smaller than a thresholdvalue �" (see Theorem 5.4). However the value �" generally is not known in advance, and thereforehas to be determinated during the minimization process. Actually, it turns out that this is aneasy task that can be accomplished by modifying the procedure linesearch in the followingway."-linesearch: If rP (xk; ")Tdk � ��"kdkk2; (36)

An algorithm for bound constrained problems 17then �nd the smallest integer from i = 0; 1; : : : such thatxk + 2�idk 2 S (37)P (xk + 2�idk; ") � R� �2�irP (xk; ")Tdk; (38)set �k = 2�i, ` = k + 1 and update R;otherwiseset " = 0:5" and restart the NMSB algorithm withx0 = xk if P (xk ; ") � P (xa; ") and x0 = xa otherwise.Basically, the procedure "-linesearch di�eres from the procedure linesearch in that while inthe procedure linesearch it was \vaguely" required to modify dk to satisfy Assumption 3, inthe procedure "-linesearch it is checked whether this assumption holds. If not the value of " isresuced. Roughly speaking, Theorem 5.4 guarantees that after a �nite number of reductions thevalue of " settles down and Assumption 3 is always satis�ed. Note also that when a reduction of "takes place, and hence, in a sense, we change objective function, we also restart the minimizationprocess from xa if this leads to a better penalty function value.We call the algorithm obtained by subtituting the procedure linesearch by the procedure"-linesearch, algorithm "-NMSB. The following theorem can be proved.Theorem 6.1 Suppose that we employ algorithm "-NMSB to minimize P (x; "). Suppose alsothat Assumptions 1 and 4 hold. Then(a) After a �nite number of iteration the penalty parameter " stays �xed;(b) There exists at least a limit point of the sequence fxkg generated by the algorithm;(c) Every limit point of the sequence fxkg is a KKT point of Problem (PB).Proof. We �rst prove point (a). The proof is by contradiction. Suppose that " is reduced anin�nite number of times. Then, there exist subsequences f"kgK and fxkgK such that f"kgK # 0and, for every k 2 K (26) is violated, i.e.rP (xk ; "k)Tdk > ��"kdkk2:Since f"kgK # 0 and taking into account (37) and the fact that, by the instructions of the"-NMSB algorithm, P (xk ; ") � P (xa; ") and xk 2 S, we can apply Theorem 5.4. But then weobtain a contradiction to Theorem 5.4 because eventually �"k becomes smaller than , where is the positive constant whose existence is proved in Theorem 5.4. Thus point (a) is proved.Points (b) and (c) now readily follow by Theorem 4.1. /We now pass to analyze the local properties of the algorithm. We �rst show that if conver-gence occurs towards a point satisfying the strong second order su�cient condition and exactsecond order information is used, then the convergence rate is quadratic.Theorem 6.2 Suppose that the sequence fxkg produced by the algorithm converges to a point �xsatisfying the strong second order su�cient condition, and that eventually Hk = r2f(xk). Then,eventually xk+1 = xk + dk (i.e. the stepsize of one is accepted eventually) and the convergencerate is quadratic.

18 F. Facchinei and S. LucidiProof.We �rst make two preliminary observations. The �rst observation is that the gradient ofP is semismooth according to the de�nition of [28, 32]. This follows easily by the expression (9)and the fact that the composite of semismooth functions is semismooth, that the max operatoris semismooth, and that smooth functions are also semismooth [28]. The second observation isthat the direction dk used by the algorithm can also be obtained as the solution of the followinglinear system 2664 Hk ETLk �ETUkELk 0 0�EUk 0 0 37752664 dkzLkzUk 3775 = �2664 rfk(xk � l)Lk(u� xk)Uk 3775 : (39)This easily follows by the special structure of the matrices ELk and EUk and of the systems(39) and (19). This means that the direction dk is the same direction considered in [15] withreference to a local algorithm for the solution of inequality constrained problems of general type.Taking into account these two observations, and the fact that eventually the penalty pa-rameter is no longer changed (see the previous theorem) the theorem readily follows from [15,Theorem 3.2] and [11, Theorem 3.2]. /Remark 6.1 It may be interesting to note that at the beginning of the paper we made, forsimplicity, the blanket assumption that f is three times continuously di�erentiable. However itshould be clear from the proofs of Theorems 6.1 and 6.2 that to establish global convergence itis su�cient to assume continuous di�erentiability of the objective function while to prove thequadratic convergence rate of the algorithm it is enough to assume that the Hessian of f issemismooth. Furthermore, these di�erentiability assumptions are only needed in S.Remark 6.2 In Theorem 6.2 we made the assumption that fxkg ! �x. We remark however,that it is standard to prove that if one of the limit points of the sequence fxkg generated by thealgorithm satis�es the strong second order su�cient condition then the whole sequence convergesto this point.Exploiting the results of [15] it is now easy to analyze also the case in which quasi-Newtonmethods are employed.Theorem 6.3 Suppose that the sequence fxkg produced by the algorithm converges to a point�x satisfying the strong second order su�cient condition, and thatlimk!1 k(HkNk �r2fkNk)dk)kkdkk = 0:Then, eventually xk+1 = xk + dk (i.e. the stepsize of one is accepted eventually) and theconvergence rate is superlinear.Proof. The proof is similar to that of the previous Theorem, the only di�erence being that thistime we invoke [15, Theorem 5.2] instead of [15, Theorem 3.2]. /7 ConclusionIn this paper we described a new globally and superlinearly convergent algorithm for the solutionof box constrained optimization problems. The algorithm is based on a continuously di�eren-tiable merit function and on a nonmonotone linesearch technique and uses a new identi�cation

An algorithm for bound constrained problems 19technique of the active constraints which seems promising, see [13, 14]. Among the favourablecharacteristics of the new algorithm we recall the low computational cost per iteration and thefact that superlinear convergece can be established without requiring strict complementarity.Furthermore, although this issue was not addressed in this paper, it is not di�cult to envisagea truncated version of the algorithm described here, see [14], which we believe to be, on the ba-sis of some very preliminary computational experience, promising in the solution of large-scaleproblems.8 Appendix A.To establish Theorem 4.1, we �rst need some technical results.First of all we remark that, by Proposition 3.1, the seto := fx 2 S : P (x; ") � P (xa; ")g:is a compact set (for every �xed " > 0).Lemma 8.1 Let F j = max0� i�m(j)P (x`(j�i); ") (40)assume that Algorithm NMSB produces an in�nite sequence fxkg; then:(a) the sequence fF jg is non increasing and has a limit F̂ ;(b) for any index j we have F i < F j ; for all i � j + �m+ 1;(c) fxkg remains in a compact set.Proof. We observe �rst that fxkg contains a subsequence of points x`(j) where the objectivefunction is evaluated. At each of these points, we can de�ne a new value F j+1 according to(40). By de�nition, the number of previous function values, that are taken into account fordetermining F j+1, increases at most by one at each j-update, that is m(j + 1) � m(j) + 1.Therefore we can write:F j+1 = max0� i�m(j+1)P (x`(j+1�i); ") � max0� i�m(j)+1P (x`(j+1�i); ")= max [P (x`(j+1); "); max0� i�m(j)P (x`(j�i); ")] = max [P (x`(j+1); "); F j]:On the other hand, the instructions of Algorithm NMSB and the condition on Rj+1 ensure that:P (x`(j+1); ") < Rj � F j ; (41)and therefore we get, for all j: F j+1 � F j : (42)>From (41) and (42) it follows that P (x`(j); ") � Ro = F o = P (xa; ") and hence thatx`(j) 2 o for all j. Since o is a compact set we have that the sequence fF jg is bounded frombelow so that, by (42), there exists F̂ such that:limj!1 F j = F̂ ;

20 F. Facchinei and S. Lucidiand this establishes (a).Property (b) follows from (41) and the fact that, for all j, F j is computed by taking themaximum over at most �m+ 1 previous function values.As regards (c) we �rst observe that since the level set o is bounded and x`(j) 2 o, thereexists a number � such that kx`(j)k � � for all j. Further, the algorithm ensures that theobjective function is computed at least every N iterations, that is `(j + 1) � `(j) +N , so that,for any xk =2 fx`(j)g, there exists an integer �k � N and a point xr 2 fx`(j)g such thatxk = xr + �k�1Xi=0 dr+i:Since the points xr+i; i = 1; : : : ; �k do not belong to fx`(j)g, by the test at n-step, we havekdr+ik � �o; i = 0; 1; : : : ; �k � 1, so that kxkk � kxrk+ �o�k � � + �oN , which shows that thewhole sequence fxkg is bounded. /Lemma 8.2 Assume that Algorithm NMSB produces an in�nite sequence fxkg; let fx`(j)g bethe sequence of points where the objective function is evaluated and let q(k) be the index de�nedby: q(k) = max [j : `(j) � k]: (43)Then, there exists a sequence fxs(j)g satisfying the following conditions:(a) F j = P (xs(j); "), for j = 0; 1; : : :;(b) for any integer k, there exist indices hk and jk such that:0 < hk � k � N ( �m+ 1); hk = s(jk); (44)F jk = P (xhk ; ") < F q(k): (45)Proof. Let s(j) be an index in the set f`(j); `(j� 1); : : : ; `(j �m(j))g such that:P (xs(j); ") = max0�i�m(j)P (x`(j�i); ");then (a) follows from the de�nition of F j .Since m(j) is bounded by the integer �m and `(j)!1 for j !1, we have that s(j)!1.Let now xk be any point produced by the algorithm and let q(k) be the index de�ned by(43) (thus `(q(k)) is the largest index not exceeding k of an iteration that evaluates the objectivefunction). We note that, by (43), q(h) > q(k) implies h > k.Consider the index jk = q(k) + �m+ 1; by the de�nition of F jk , there is a point xhk = xs(jk)such that P (xhk ; ") = F jk and jk � q(hk) � jk �m(jk):Therefore we have q(hk) � jk � m(jk) � jk � �m = q(k) + 1 and this implies that hk > k.Moreover, since q(hk)� q(k) � �m+ 1 and the function is evaluated at least every N iterations,we have that: hk � k � ( �m+ 1)N :Finally, by (b) of Lemma 8.1 we have: F jk < F q(k)which completes the proof of assertion (b). /

An algorithm for bound constrained problems 21Lemma 8.3 Assume that Algorithm NMSB produces an in�nite sequence fxkg.Then, we have: limk!1P (xk ; ") = limj!1 F j = F̂ ; (46)limk!1 kxk+1 � xkk = 0: (47)Proof. Let fxkgK denote the set (possibly empty) of points satisfying the test:kdkk � �o�t; for k 2 K (48)where the integer t increases with k 2 K; when k 2 K we set, for convenience, �k = 1. It followsfrom (48) that, if K is an in�nite set, we have:limk!1k2K �kkdkk = 0: (49)Let now s(j) and q(k) be the indices de�ned in Lemma 8.2. We show by induction that, for any�xed integer i � 1, we have: limj!1�s(j)�ikds(j)�ik = 0; (50)limj!1P (xs(j)�i; ") = limj!1P (xs(j); ") = limj!1 F j = F̂ : (51)(Here and in the sequel we assume that the index j is large enough to avoid the occurrence ofnegative subscripts.) Assume �rst that i = 1. If s(j) � 1 2 K, (50) holds with k = s(j)� 1.Otherwise, if s(j)� 1 =2 K, recalling the acceptability criterion of the nonmonotone line search,we can write: F j = P (xs(j); ") = P (xs(j)�1 + �s(j)�1ds(j)�1; ") (52)� F q(s(j)�1) + �s(j)�1rP (xs(j)�1; ")Tds(j)�1: (53)It follows that: F q(s(j)�1) � F j � �s(j)�1jrP (xs(j)�1; ")Tds(j)�1j: (54)Therefore, if s(j)� 1 =2 K for an in�nite subsequence, from (a) of Lemma 8.1 and (54) we get�s(j)�1g(xs(j)�1)Tds(j)�1 ! 0, so that by Assumption 3 on the search direction and �s(j)�1 � 1we have also �s(j)�1kds(j)�1k ! 0 for this subsequence. It can be concluded that (50) holds fori = 1. Moreover since: P (xs(j); ") = P (xs(j)�1 + �s(j)�1ds(j)�1; ");by (50) and the uniform continuity of P on the compact set containing fxkg, equation (51) holdsfor i = 1.Assume now that (50) and (51) hold for a given i and consider the point xs(j)�i�1. Reasoningas before, we can again distinguish the case s(j)�i�1 2 K, when (48) holds with k = s(j)�i�1,and the case s(j)� i� 1 =2 K, in which we have:P (xs(j)�i; ") � F q(s(j)�i�1) + �s(j)�i�1rP (xs(j)�i�1; ")Tds(j)�i�1and hence: F q(s(j)�i�1) � P (xs(j)�i; ") � �s(j)�i�1jrP (xs(j)�i�1; ")Tds(j)�i�1j: (55)

22 F. Facchinei and S. LucidiThen, using (49), (51), (55) and recalling that the direction satis�es Assumption 3, we can assertthat equation (50) holds with i replaced by i+ 1. By (50) and the uniform continuity of P , itfollows that also (51) is satis�ed with i replaced by i+ 1, which completes the induction.Let now xk be any given point produced by the algorithm. Then by Lemma 8.2 there is apoint xhk 2 fxs(j)g such that 0 < hk � k � ( �m+ 1)N : (56)Then, we can write: xk = xhk � hk�kXi=1 �hk�idhk�i:and this implies, by (50) and (56), that:limk!1 kxk � xhkk = 0: (57)>From the uniform continuity of P , it follows thatlimk!1P (xk ; ") = limk!1P (xhk ; ") = limj!1 F j ; (58)and (a) is proved.If k =2 K, we obtain P (xk+1; ") � F q(k) + �krP (xk; ")Tdk and hence we have that:F q(k) � P (xk+1; ") � �kjrP (xk; ")Tdkj: (59)Therefore by (49), (58), (59) and Assumption 3, we can conclude that:limk!1�kkdkk = 0;which establishes (b). /Finally we can prove Theorem 4.1 .Proof of Theorem 4.1. We observe now that if the algorithm terminates after a �nite numberof iterations, the thesis follows by Theorem 5.2 and the stopping criterion. Suppose then thatthe sequence fxkg is in�nite. By Lemma 8.1 all the points of the sequence belong to a compactset and therefore fxkg admits at least a limit point. Denote by �x any such limit point, andrelabel fxkg a subsequence converging to �x. By (47) of Lemma 8.3, we havelimk!1�kkdkk = 0: (60)Then either limk!1 kdkk = 0 or there exists a subsequence fxkgK1 of fxkg such that limkK1!1 �k =0. In the �rst case the thesis follows by Theorem 5.3; then, let us consider the second case. Inthe second case we can assume without loss of generality, that there are two possibilities:(a) the sequence f�kgK1 is such thatxk + 2�kdk 2 IRn n S; (61)(b) the sequence f�kgK1 is such thatP (xk + 2�kdk; ~") � P (xk ; ~") + 2�krP (xk; ~")dk;

An algorithm for bound constrained problems 23where we took into account that R is the maximum among the last �m previous values of P .We analyse �rst the case (a). Since fxkgK1 ! �x we have, by (60) thatfxk + 2�kdkgK1 ! �x: (62)But, taking into account that by (46), fP (xk ; ~")gK1 tends to a �nite value, we have that actually�x 2 S. Hence, taking into account that S is an open set, we get a contradiction from (61) and(62). Then, let us examine the case (b). By the theorem of the mean we can �nd, for k 2 K1su�ciently large, a point uk = xk + !k2�kdk with !k 2 (0; 1) such thatrP (uk; ~")dk � rP (xk; ~")dk: (63)Let now fxkgK2 be a subsequence of fxkgK1 such that limkK2!1 dkkdkk = �d: By (61) we havefukgK2 ! �x, so that, dividing both members in (63) by kdkk and taking limits, we obtain:(1� )rP (�x; ~") �d � 0. Since 1� > 0 we getrP (�x; ~") �d � 0: (64)But, by Step 6, we also have, for all k 2 K2,rP (xk; ~") dkkdkk2 < ��~";which implies rP (�x; ~") �d < 0 which, in turn, contradicts (64). /References[1] D.P. Bertsekas: Constrained Optimization and Lagrange Multiplier Methods. AcademicPress, New York, 1982.[2] D.P. Bertsekas: Projected Newton methods for optimization problems with simple con-straints. SIAM Journal on Control and Optimization 20, 1982, pp. 221{246.[3] J. Burke and J. Mor�e: On the identi�cation of active constraints. SIAM Journal onNumerical Analysis 25, 1988, pp. 1197{1211.[4] P. Calamai and J. Mor�e: Projected gradient for linearly constrained problems. Mathe-matical Programming (Series A) 39, 1987, pp. 93{116.[5] T. Coleman and L. Hulbert: A direct active set algorithm for large sparse quadraticprograms with simple bounds.Mathematical Programming (Series A) 45, 1987, pp. 373{406.[6] T. Coleman and L. Hulbert: A globally and superlinearly convergent algorithm forconvex quadratic programs with simple bounds. SIAM Journal on Optimization 3, 1993,pp. 298{321.[7] A. Conn, N. Gould and Ph. Toint: Algorithms for minimization subject to bounds.SIAM Journal on Numerical Analysis 25, 1988, pp. 433{460.[8] A. Conn, N. Gould and Ph. Toint: Testing a class of methods for solving minimizationproblems with simple bounds on the variables. Mathematics of Computation 50, 1988, pp.399{430.

24 F. Facchinei and S. Lucidi[9] R. Cottle and M. Goheen: A special class of large quadratic programs. in NonlinearProgramming 3, O.L. Mangasarian, R.R. Meyer and S. Robinson (eds.), Academic Press,New-York, 1978, pp. 361{390.[10] J. Dunn: Global and asymptotic convergence rate estimate for a class of projected gradientprocess. SIAM Journal on Control and Optimization 19, 1981, pp. 368{400.[11] F. Facchinei: Minimization of SC1 functions and the Maratos e�ect. Operations ResearchLetters 17, 1995, pp. 131{137.[12] F. Facchinei and S. Lucidi: A method for the minimization of a quadratic convex func-tion over the simplex. Operations Research Proceedings 1990, W. B�uhler et al., Springer-Verlag, Berlin, 1992, pp. 125{132.[13] F. Facchinei and S. Lucidi: A class of penalty functions for optimization problems withbounds constraints. Optimization 26, 1992, pp. 239{259.[14] F. Facchinei and S. Lucidi: A class of methods for optimization problems with simplebounds. Part 2: Algorithms and numerical results. Technical Report R.336, IASI-CNR,Roma, Italy, 1992.[15] F. Facchinei and S. Lucidi: Quadratically and superlinearly convergent algorithms forthe solution of inequality constrained minimization problems. Journal of Optimization The-ory and Applications 85, 1995, pp. 265{289.[16] A. Friedlander, J.M. Martinez, S.A. Santos: A new trust region algorithm for boundconstrained minimization. Applied Mathematics and Optimization 30, 1994, pp. 235{266.[17] P. Gill, W. Murray and M. Wright: Practical Optimization. Academic Press, NewYork, 1981.[18] L. Grippo, F. Lampariello and S. Lucidi: A nonmonotone linesearch technique forNewton's method. SIAM Journal on Numerical Analysis 23, 1986, pp. 707{716.[19] L. Grippo, F. Lampariello and S. Lucidi: A class of nonmonotone stabilization meth-ods in unconstrained optimization. Numerische Mathematik 59, 1991, pp. 779{805.[20] L. Grippo and S. Lucidi: A di�erentiable exact penalty function for bound constrainedquadratic programming problems. Optimization 22, 1991, pp. 557{578.[21] L. Grippo and S. Lucidi: On the solution of a class of quadratic programs using adi�erentiable exact penalty function. in System Modelling and Optimization, H.J. Sebastianand K. Tammer eds., Springer-Verlag, Berlin, 1990, pp. 764{773.[22] M.Lescrenier: Convergence of trust region algorithms for optimization with bounds whenstrict complementarity does not hold. SIAM Journal on Numerical Analysis 28, 1991, pp.476{495.[23] W. Li: Di�erentiable Piecewise Quadratic exact penalty functions for quadratic programswith simple bound constraints. Department of Mathematics and Statistics, Old DominionUniversity, Norfolk, USA, 1984. To appear in SIAM Journal on Optimization[24] W. Li: Linearly convergent descent methods for unconstrained minimization of convexquadratic spline. Department of Mathematics and Statistics, Old Dominion University,Norfolk, USA, 1994. To appear in Journal of Optimization Theory and Applications.

https://www.researchgate.net/publication/243695010_Minimization_of_SC1-functions_and_the_Maratos_E_ect?el=1_x_8&enrichId=rgreq-dc0dbf62-cb55-4ad5-ab4b-377217da1b22&enrichSource=Y292ZXJQYWdlOzIzNjg0MDc7QVM6OTcxNjYyMTUzNTIzMjVAMTQwMDE3NzYzOTM3NA==

An algorithm for bound constrained problems 25[25] W. Li and J. Swetits: A new algorithm for strictly convex quadratic programs. Techni-cal Report TR92-1, Department of Mathematics and Statistics, Old Dominion University,Norfolk, USA, 1992. To appear in SIAM Journal on Optimization.[26] W. Li and J. Swetits: A newton method for convex regression, data smoothing, andquadratic programming with bounded constraints. SIAM Journal on Optimization 3, 1993,pp. 466{488.[27] G.P. McCormick: Nonlinear Programming. John Wiley & Sons, New York, 1983.[28] R. Mifflin: Semismooth and semiconvex functions in constrained optimization. SIAMJournal on Control and Optimization 15, 1977, pp. 957{972.[29] J. Mor�e and G. Toraldo: Algorithms for bound constrained quadratic programmingproblems. Numerische Mathematik 55, 1989, pp. 377{400.[30] J. Mor�e and G. Toraldo: Numerical solution of large quadratic programming problemswith bound constraints. SIAM Journal on Control and Optimization 1, 1991, pp. 93{113.[31] S. Nash and A. Sofer: A barrier method for large-scale constrained optimization. ORSAJournal on Computing 5, 1993, pp. 40{53.[32] L. Qi and J. Sun: A nonsmooth version of Newton's method. Mathematical Programming(Series A) 58, 1993, pp. 353{368.[33] S.M. Robinson: Generalized equations. InMathematical programming: the state of the art,A. Bachem, M. Groetschel and B. Korte editors, Springer-Verlag, Berlin, 1983, pp. 346{367.[34] S. Wright: Algorithms for minimization subject to bounds. Technical Report MCS-P32-1288, Argonne National Laboratory, Mathematics and Computer Science Division, Decem-ber 1988.

An unconstrained minimization approach to the solution of optimization problems with simple bounds

Documents

Transcript of An unconstrained minimization approach to the solution of optimization problems with simple bounds