On Some Properties of Quadratic Programs With a Convex Quadratic Constraint

Universit�a degli Studi di Roma \La Sapienza"Dipartimento di Informatica e SistemisticaS. Lucidi L. Palagi M. RomaOn some properties of quadratic programs with a convexquadratic constraint 1Tech. Rep. 33-96Stefano Lucidi, Laura Palagi, Massimo RomaDipartimento di Informatica e Sistemistica , Universit�a di Roma \La Sapienza"via Buonarroti 12 - 00185 Roma, Italy.E-mail: [email protected], [email protected], [email protected] work was partially supported by Agenzia Spaziale Italiana, Roma, Italy.

On some properties of quadratic programs with a convexquadratic constraintyStefano Lucidi Laura Palagi Massimo Roma zAbstractIn this paper we consider the problem of minimizing a (possibly nonconvex)quadratic function with a quadratic constraint. We point out some new propertiesof the problem. In particular, in the �rst part of the paper, we show that (i) givena KKT point that is not a global minimizer, it is easy to �nd a \better" feasiblepoint; (ii) strict complementarity holds at the local-nonglobal minimizer.In the second part, we show that the original constrained problem is equivalentto the unconstrained minimization of a piecewise quartic merit function. Usingthe unconstrained formulation we give, in the nonconvex case, a new second ordernecessary condition for global minimizers.In the third part, algorithmic applications of the preceding results are brie y out-lined and some preliminary numerical experiments are reported.Key words: quadratic function, `2-norm constraint, merit function.AMS subject classi�cation: 90C30, 65K051 IntroductionIn this paper we study the problem of minimizing a general quadratic function q :IRn ! IR subject to an ellipsoidal constraint, that isminfq(x) : xTHx � a2g; (1)where H is a symmetric positive de�nite n � n matrix and a is a positive scalar.The interest in this problem initially arose in the context of trust region methods forsolving unconstrained optimization problems. In fact, such methods require at eachiteration an approximate solution of Problem (1) where q(x) is a local quadratic modelof the objective function over a restricted ellipsoidal region centered around the currentiterate. However, recently, it has been shown that problems with the same structure ofyThis work was partially supported by Agenzia Spaziale Italiana, Roma, ItalyzUniversit�a di Roma \La Sapienza" - Dipartimento di Informatica e Sistemistica - via Buonarroti, 12- 00185 Roma, Italy and Gruppo Nazionale per l'Analisi Funzionale e le sue Applicazioni del ConsiglioNazionale delle Ricerche. 2

(1) play an important role not only in the �eld of unconstrained minimization. In fact,the solution of Problem (1) is at the basis of algorithms for solving general constrainednonlinear problems (e.g. [3, 42, 20, 27]), and integer programming problems (e.g.[21, 41, 22, 31, 19]).Many papers have been devoted to point out the speci�c features of Problem (1).Among the most important results there are the necessary and su�cient conditionsfor a point x to be a global minimizer, due to Gay [12] and Sorensen [35], and thecharacterization and uniqueness of the local-nonglobal minimizer due to Mart��nez [26].The particular structure of the Problem (1) has led to the development of algorithms for�nding a global solution. The �rst algorithms proposed in literature were those of Gayand Sorensen [12, 35]. Mor�e and Sorensen [29] developed an algorithm that producesan approximate global minimizer in a �nite number of steps. More recently, it has beenproved that an approximation to the global solution can be computed in polynomialtime (see for example [39, 38, 40, 41, 21]). Furthermore, Mor�e [28] has considered amore general case by allowing in Problem (1) a general quadratic constraint and hasextended the results of [12, 35, 29].In spite of all these results, there is still interest in studying Problem (1). In fact,as we mentioned before, there is a growing use of Problem (1) as a tool for tacklinglarge nonlinear programming problems and combinatorial optimization problems. Thisleads to the necessity of solving more and more e�ciently large scale problems ofthe type (1) and motivates further research on theoretical properties of Problem (1)and on the de�nition of e�cient methods for locating its global minimizers. Recentlysome interesting algorithms for tackling large scale trust region problems have beenproposed in [36, 34, 33]. The basic idea behind these algorithms is to recast the trustregion problem in term of a parametrized eigenvalue problem and then to adjust theparameter to �nd an optimal solution.In this paper we point out further theoretical properties of Problem (1). In par-ticular, our research develops along two lines: the study of some new properties ofits Karush-Kuhn-Tucker points and its equivalence to an unconstrained minimizationproblem. Besides their own theorical interest, these results allows us to de�ne newclasses of algorithms for solving large scale case trust region problems. These algo-rithms use only matrix-vector product and do not require the solution of an eigenvalueproblem at each iteration (see [24] for details).The paper is organized as follows. In Section 2 we recall some preliminary results. InSection 3 we show that(i) given a KKT point �x which is not a global minimizer, it is possible to �nd anew feasible point x̂ such that the objective function is strictly decreased, i.e.q(x̂) < q(�x);(ii) the strict complementarity condition holds at the local minimizer, hence in thenonconvex case, strict complementarity holds at local and global minimizers.In Section 4 we show that there is a one to one correspondence between KKTpoints (global minimizers) of Problem (1) and stationary points (global minimizers) of3

a piecewise quartic merit function P : IRn ! IR. Therefore, Problem (1) is equivalentto the unconstrained problem of P over IRn. In Section 5, by exploiting some resultsof the preceding sections, we give a new second order necessary condition for globalminimizers of Problem (1). Finally, in Section 6, we sketch some possible applicationsof the results of Section 3 and Section 4 for de�ning new classes of algorithms for solvinglarge scale trust region problems.In the sequel we will use the following notation. Given a vector x 2 IRn, wedenote by kxk the `2-norm on IRn. The `2-norm of a n � n matrix Q is de�ned bykQk = supfkQxk : kxk = 1g. Moreover, we denote by �1 � �2 � : : : � �n theeigenvalues of Q.2 PreliminariesWithout loss of generality we can assume that the feasible set F is de�ned byF = nx 2 IRn : kxk2 � a2o ;so that the problem under consideration isminfq(x) : kxk2 � a2g (2)where q : IRn ! IR is given by q(x) = 12xTQx+ cTx (3)and Q is a n � n symmetric matrix and c 2 IRn.In fact, since H is positive de�nite, we can reduce Problem (1) to the form (2) by em-ploying the transformation y = H 12x (however we refer to [25] for the direct treatmentof Problem (1)).The Lagrangian function associated with Problem (2) is the functionL(x; �) = 12xTQx+ cTx+ �(kxk2 � a2)A Karush-Kuhn-Tucker point for Problem (2) is a pair (�x; ��) 2 IRn � IR such that:(Q+ 2��I)�x = �c��(k�xk2 � a2) = 0�� 0; k�xk2 � a2: (4)Furthermore, we say that strict complementarity holds at a KKT pair (�x; ��) if �� > 0for k�xk2 = a2.It is well known that it is possible to completely characterize the global solutionsof Problem (2) without requiring any convexity assumption on the objective function.In fact, the following result due to Gay [12] and Sorensen [35] holds (see also Vavasis[38]): 4

Proposition 2.1 A point x� such that kx�k2 � a2 is a global solution of Problem (2),if and only if there exists a unique �� 0 such that the pair (x�; ��) satis�es the KKTconditions (Q+ 2��I)x� = �c��(kx�k2 � a2) = 0and the matrix (Q + 2��I) is positive semide�nite. If (Q + 2��I) is positive de�nitethen Problem (2) has a unique global solution.Moreover, Mart��nez [26] gave the following characterization of the local-nonglobal min-imizer for Problem (2).Proposition 2.2 There exists at most one local-nonglobal minimizer �x of Problem(2). Moreover we have k�xk = a2 and the KKT necessary conditions holds with �� 2(��2;��1) where �1 < �2 are the two smallest eigenvalues of Q.3 Further features of KKT pointsIn this section we give some new properties of the KKT points for Problem (2). Ourinterest in the characterization of KKT points is due to the fact that, in general,algorithms for the solution of constrained problems, converge towards KKT points.We show that the number of di�erent values that the objective function can take atKKT points is bounded from above by the number of negative eigenvalues of the matrixQ. First we state a preliminary result that extends one given in [38].Lemma 3.1 Let (bx; �) and (�x; �) be KKT pairs for Problem (2) with the same KKTmultiplier. Then q(bx) = q(�x).Proof We observe that the function q(x) can be rewritten at every KKT pair (x; �)as follows q(x) = 12cTx� � kxk2 :By using the KKT conditions we obtainq(bx) = 12cT bx� � kbxk2 = �12 �xT (Q+ 2�I) bx� �a2 = 12cT �x� � k�xk2 = q(�x)Now, we can state the following proposition whose proof follows from a result ofForsythe and Golub [11] on the number of stationary values of a second degree poly-nomial on the unit sphere. For sake of completeness we give a sketch of the proof.Proposition 3.2 There exist at most minf2m + 2; 2n+ 1g KKT points with distinctmultipliers �, where m is the number of negative distinct eigenvalues of Q.5

Proof First we observe that at every KKT point (x; �) such that kxk2 < a2 the valueof the objective function q is constant. This easily follows from Lemma 3.1 by observingthat all these pairs are characterized by the fact that � = 0.Now, we consider the values of the function q(x) at all the points such that kxk2 =a2: Since Q is symmetric, there exists an orthogonal matrix V such that V TQV =diagi=1;:::;nf�ig where �1 � �2 � : : : � �n are the eigenvalues of Q. By consideringthe transformation x = V � we can write the �rst equation of the KKT condition (4)(premultiplied by V T ) as follows:�diagi=1;:::;nf�ig+ 2�I�� = �� (5)with � = V T c. Hence, recalling that kxk2 = kV �k2 = a2, we have that the KKTmultipliers must satisfy the system g(�) = a2� � 0 (6)where g(�) = nXi=1 �2i(�i + 2�)2 : (7)The function g(�) has poles at� 12�i and it is convex on the subintervals �� 12�i�1 ;� 12�i �.Thus there exists at most 2 roots of g(�) = a2 in each subinterval. Moreover, sinceg(�)! 0 as �! �1, the equation g(�) = a2 has one root in each exteme subinterval.If all eigenvalues �i are positive there exists at most one non negative root; if all theeigenvalues are negative there are at most 2n non negative roots; in the case of m < nnegative eigenvalues, there are at most 2m+ 1 non negative roots. Hence the numberof the solutions of system (6) is at most minf2m+ 1; 2ng.Finally, by summarizing the two cases, we can conclude that the number of distinctKKT multipliers is bounded above by minf2m+ 1; 2ng+ 1.Recalling Lemma 3.1, we get directly the following corollary.Corollary 3.3 The number of distinct values of the objective function q(x) at KKTpoints is bounded from above by minf2m+ 2; 2n+ 1g.Now we can state the main result of this section. In particular, we show that thepeculiarity of Problem (2) can be exploited to escape from the KKT points that arenot global solutions in the sense that, whenever we have a KKT point �x, either �x isa global minimizer of Problem (2), or it is possible to compute the expression of afeasible point with a strictly lower value of the objective function. This results is veryappealing from a computational point of view, as discussed in Section 6.Proposition 3.4 Let (�x; ��) be a KKT point for Problem (2). Let us de�ne the pointx̂ in the following way 6

(a) If cT �x > 0 then x̂ = ��x:(b) If cT �x � 0 and a vector z 2 IRn such that zT (Q+ 2��I)z < 0 exists, then(i) if k�xk2 < a2, x̂ = �x+ �zwith 0 < � � �zT �x+ h(zT �x)2 + (a2 � k�xk2) kzk2i 12kzk2 :(ii) if k�xk2 = a2, and �xTz 6= 0, x̂ = �x� 2 �xTzkzk2 z:(iii) if k�xk2 = a2 and �xT z = 0,x̂ = �x� 2 a2a2 + �2 kzk2 (�x+ �z)with � > cTz � h(cTz)2 + (cT �x)(zT (Q+ 2��I)z)i12zT (Q+ 2��I)z :Then we have q(x̂) < q(�x) and kx̂k2 � a2.Proof In case (a), the point x̂ is still feasible andq(x̂) = 12 x̂TQx̂+ cT x̂ = 12�xTQ�x� jcT �xj = q(�x)� 2jcT �xj < q(�x):Now consider case (b). In case (i) we have by the KKT conditions that �� = 0 andhence we have that z is a vector of negative curvature for q(x). Therefore, for every� > 0 the point x̂ = �x+ �z satis�es the inequalityq(�x+ �z) = q(�x) + 12�2zTQz < q(�x):In particular, if we take � � ~� with~� = �zT �x+ h(zT �x)2 + j k�xk2 � a2j kzk2i12kzk2we have that kx̂k2 � a2. 7

Now, let us consider case (ii). Let x̂ be the vector de�ned as followsx̂ = �x� 2 �xT zkzk2 zand consider the quadratic functionL(x; ��) = 12xT (Q+ 2��I)x+ cTx: (8)We note that kx̂k2 = a2 and that z is a negative curvature direction for the quadraticfunction L(x; ��). By simple calculation, taking into account that (Q+ 2��I)�x = �c weget L(x̂; ��) = L(�x; ��) + 2 j�xTzj2kzk2 zT (Q+ 2��I)zand hence L(x̂; ��) < L(�x; ��). Hence, recalling the expression (8) we can writeq(x̂) < q(�x) + 12��(k�xk2 � kx̂k2) = q(�x):Hence we get the result for case (ii).Let us consider the case (iii). Let us de�ne the vector �s = �x+ �z with � > 0: We can�nd a value for � such that �s is a negative curvature direction for L(x; ��) and �sT �x 6= 0,so that we can proceed as in case (ii). In fact, by simple calculation we have:�sT �x = (�x+ �z)T �x = �xT �x = a2and by using the KKT conditions�sT (Q+ 2��I)�s = �xT (Q+ 2��I)�x+ �2zT (Q+ 2��I)z + 2��xT (Q+ 2��I)z= ��2 ��zT (Q+ 2��I)z�� 2�cT z + jcT �xj:By solving the quadratic equation with respect to � we get that �sT (Q+ 2��I)�s < 0 forall � > �� where �� = �cTz + h(cTz)2 + jcT �xjjzT (Q+ 2��I)zji12jzT (Q+ 2��I)zjHence, by proceeding as in case (ii), we get the result by introducing the pointx̂ = �x� 2 �xT (�x+ �z)(�x+ �z)T (�x+ �z)(�x+ �z)with � > ��.Remark We note that the local-nonglobal minimizer can corresponds either to thecase (a) with k�xk2 = a2 or to the case (b)(ii).8

The preceding proposition shows that, if the KKT point �x is not a global minimizer,it is possible to determine a feasible point x̂ such that q(x̂) < q(�x) by computing atmost a direction z such that zT (Q + 2�I)z < 0. The existence of such a direction isguaranteed by Proposition 2.1 and from the numerical point of view, its computationis not an expensive task. In fact, we can obtain such a direction by using, for example,the Bunch-Parlett decomposition [2, 30], modi�ed Cholesky factorizations [10] or, forlarge scale problem, methods based on Lanczos algorithms [4].Now, as last result of this section, we investigate a regularity property of the lo-cal and global minimizers. In particular, we focus our attention on the strict com-plementarity property, that, roughly speaking, indicates that these points are \reallyconstrained". Also this property can be interesting from an algorithmic point of view.Proposition 3.5 At the local-nonglobal minimizer for Problem (2) the strict comple-mentarity condition holds.Proof Since �x is a local minimizer the KKT conditions (4) hold. Moreover the secondorder necessary conditions require thatzT (Q+ 2��I)z � 0 for all z : zT �x = 0: (9)By Proposition 2.2 we have that �� 2 (��2;��1) and k�xk2 = a2: Obviously if �1; �2 � 0there is no local-nonglobal minimizer. Furthermore, if �1 < 0 and �2 � 0 necessarily�� > 0. So we can restrict ourselves to the case�1 < 0; �2 > 0;since in this case 0 2 (��2;��1). Let us assume by contradiction that �� = 0. From(4) and Proposition 2.2 we have that rq(�x) = 0zTQz � 0 for all z : zT �x = 0�� = 0; k�xk2 = a2Since �x is not a global minimizer, by Proposition 2.1 there exists a direction y suchthat yTQy < 0 and from the second order necessary conditions yT �x 6= 0. We assume,without loss of generality, that yT �x < 0. Let us consider the point x(�) = �x+ �y with� > 0. We prove that for su�ciently small values of � the point x(�) is feasible andproduces a smaller value of the objective function, thus contradicting the assumptionof local optimality. In fact, we havekx(�)k2 = k�xk2 + 2�yT �x+ �2 kyk2and hence for � < 2 jyT �xjkyk2 we obtain kx(�)k2 < a2. Moreover,q(x(�)) = q(�x) + �rq(�x)T y + 12�2yTQy = q(�x) + 12�2yTQy < q(�x):9

By this proposition and by Proposition 2.1 we directly obtain the following result.Proposition 3.6 In the nonconvex case at every local or global minimizer the strictcomplementarity holds.4 Unconstrained formulationIn this section, we show that Problem (2) is equivalent to an unconstrained minimiza-tion problem of a piecewise quartic merit function. A general constrained optimizationproblem can be transformed into an unconstrained problem by de�ning a continuouslydi�erentiable exact penalty function by following, for example, the approach proposedin [6, 7]. However, in the special case of minimization of a quadratic function withbox constraints, it has been shown in [16] and [23] that it is possible to de�ne simplerpenalty functions by exploiting the particular structure of the problem. In the samespirit of these papers we show that also for Problem (2) it is possible to construct aparticular continuously di�erentiable penalty function. This new penalty function takesfull advantage of the peculiarities of the trust region problem and enjoys distinguishingfeatures that make its unconstrained minimization signi�cantly simpler in comparisonwith the unconstrained minimization of the penalty functions proposed in [6, 7]. Themain properties of the penalty function proposed in this section are:� it is globally exact according to the de�nition of [7];� it does not require any shifted barrier term hence it is de�ned on the whole space;� it has a very simple expression (it is piecewise quartic);� it is known, a priori, for which values of the penalty parameter the correspondencebetween the constrained problem and the unconstrained one holds.As a �rst step for the de�nition of the exact penalty function, we recall the Hestenes-Powell-Rockafellar augmented Lagrangian function [32, 18]La(x; �; ") = q(x) + "4 "max�0; 2" (kxk2 � a2) + ��2� �2#where � 2 IRm and " is a given positive parameter.Now, according to the classical approach, we replace the multiplier vector � in thefunction La(x; �; ") with a multiplier function �(x) : IRn ! IR, which yields an estimateof the multiplier vector associated to Problem (2) as a function of the variables x. In theliterature di�erent multiplier functions have been proposed (see e.g. [9, 13, 6, 7, 23]).However, all the expression of the multiplier functions given in [9, 13, 6, 7] are notde�ned in the origin of the space.Here we de�ne a new simpler multiplier function that is de�ned on the whole space IRnwhose expression is the following�(x) = � 12a2 �xTQx+ cTx� (10)10

Its properties are summarized in the following proposition.Proposition 4.1(i) �(x) is continuosly di�erentiable with gradientr�(x) = � 12a2 (2Qx+ c) :(ii) If (�x; ��) is a KKT point for Problem (2) then we have �(�x) = ��:(iii) For every x 2 IRn we have xTrq(x) + 2�(x) kxk2 = 2�(x)(kxk2 � a2):Proof Part (i) easily follows from the de�nition of the multiplier function (10). Asregards part (ii), from (4) we have that a pair (�x; ��) satis�es�xTQ�x+ cT �x+ 2�� k�xk2 = 0: (11)It is easy to see that if k�xk2 = a2, (11) corresponds exactly to the de�nition of themultiplier function (10). Otherwise, if k�xk2 < a2, (4) imply that �� = 0 and hence bycomparing (11) and (10) that �(�x) = 0:Now let us consider part (iii). By simple calculations we havexTrq(x) + 2�(x) kxk2 = xTQx+ cTx� kxk2a2 (xTQx+ cTx)= � 12a2 (xTQx+ cTx)2(kxk2 � a2) = 2�(x)(kxk2 � a2)On the basis of the previous considerations we can replace the vector � in the functionLa with the multiplier function �(x). Furthermore, as regards the penalty parame-ter ", we can select, a priori, an interval of suitable values depending on the prob-lem data Q; c; a. Therefore, we are now ready to de�ne our merit function P (x) =La (x; �(x); "(Q; c; a)), that isP (x) = q(x) + "4 "max�0; 2"(kxk2 � a2) + �(x)�2� �2(x)# (12)where �(x) is the quadratic function given by (10) and " is any parameter that satis�esthe following inequality: 0 < " < 16a4a2 (8kQk+ 3) + 5kck2 : (13)11

First, we show some immediate properties of the merit function P .Proposition 4.2(i) P (x) is continuosly di�erentiable with gradientrP (x) = Qx+c� "2�(x)r�(x)+"2 max�0; 2" (kxk2 � a2) + �(x)��4"x+r�(x)�(ii) P (x) is twice continuosly di�erentiable except at points where2" (kxk2 � a2) + �(x) = 0;(iii) P (x) is twice continuosly di�erentiable in a neighborhood of a KKT point �x wherestrict complementarity holds;(iv) for every x such that kxk2 � a2 we have that P (x) � q(x);(v) the penalty function P (x) is coercive and hence it admits a global minimizer.Proof Part (i), (ii) and (iii) directly follows from the expression of the penalty func-tion P . As regards Part (iv) it follows from a classical results on penalty functions (seeTheorem 2 of [7]).As regards part (v), we want to show that as kxk ! 1 the function P (x) goes toin�nity. First, we observe that2" (kxk2 � a2) + �(x) � �2" � 12a2 kQk� kxk2 � 12 kck kxk � 2a2" ;hence for su�ciently large values of kxk the leading term of the preceding inequality isstrictly positive since, recalling that " satis�es (13), we have that " � 4a2kQk : Then, forsu�ciently large values of kxk, we can assume thatmax�0; 2"(kxk2 � a2) + �(x)� = 2" (kxk2 � a2) + �(x):By simple calculation, the expression of the penalty function becomes in this case:P (x) = 12xTQx+ cTx+ 1" �kxk2 � a2�2 + �(x) �kxk2 � a2� ;and the following inequalities hold:P (x) � �� 12a2 kQk+ 1"� kxk4 � kck2a2 kxk3 � 2a2" + kQk! kxk2 � 32kckkxk+ a4"As " satis�es (13), we have that " � 2a2kQk and hence we get limkxk!1P (x) =1. Theexistence of the global minimizer immediately follows from the continuity of P and thecompactness of its level sets. 12

Now, we state the �rst result about the exactness properties of the penalty functionP . Since its proof is technical and lenghty we report it in the Appendix.Proposition 4.3 A point �x 2 IRn is a stationary point of P (x) if and only if (�x; �(�x))is a KKT pair for Problem (2).Furthermore, at this point we have P (�x) = q(�x).Now we prove that there is a one to one correspondence betweeen global minimizersof Problem (2) and global minimizers of the penalty function P .Proposition 4.4 Every global minimizer of Problem (2) is a global minimizer of P (x)and conversely.Proof By Proposition 4.3, the penalty function P admits a global minimizer x̂, whichis obviously a stationary point of P and hence by the preceding proposition we havethat: P (x̂) = q(x̂):On the other hand, if x� is a global minimizer of Problem (2), it is also a KKT point andhence the preceding proposition implies again that P (x�) = q(x�). Now, we proceedby contradiction. Assume that a global minimizer x̂ of P (x) is not a global minimizerof Problem (2), then there should exists a point x�, global minimizer of Problem (2),such that P (x̂) = q(x̂) > q(x�) = P (x�)that contradicts the assumption that x̂ is a global minimizer of P . The converse is trueby analogous considerations.In order to complete the correspondence between the solution of Problem (2) and theunconstrained minimization of the penalty function P we prove the following resultthat considers the corrispondence between local minimizers.Proposition 4.5 The function P (x) admits at most a local-nonglobal minimizer �xwhich is a local minimizer of Problem (2) and �(�x) is the associated KKT multiplier.Proof We �rst prove that if �x is a local minimizer of P (x) then the pair (�x; �(�x)) sat-is�es the KKT conditions for Problem (2). Moreover, by Proposition 4.3, we have thatP (�x) = q(�x) and hence, since �x is a local minimizer of P , there exists a neighbourhood(�x) of �x such that q(�x) = P (�x) � P (x) for all x 2 (�x):Thus, by using (iv) of Proposition 4.2, we obtainq(�x) � P (x) � q(x) for all x 2 (�x) \ F (14)and hence �x is a local minimizer for Problem (2). The proof can be easily completedby recalling Proposition 2.2. 13

5 A new second order optimality conditionThe results given in Section 3 and Section 4 can be combined to state new theoreticalproperties of Problem (2). In this section we introduce a new second order necessaryoptimality condition for Problem (2) for the nonconvex case that follows from theunconstrained formulation.Proposition 5.1 Assume that Q is not positive semide�nite, if �x is a global (local)minimizer of Problem (2) then there exists a unique �� > 0 such that the KKT conditions(4) hold and Q+ 2��I + 1a2 (c�xT + �xcT ) + � 8a2 ��+ 2"� �x�xTis positive semide�nite for every " satisfying (13).Proof If �x is a global minimizer of Problem (2), by Proposition 3.6, we have that�� > 0 and, hence, k�xk2 = a2. Then, there exists a neighborhood (�x) of �x such that2"(k�xk2 � a2) + �(�x) 6= 0:Thus, by (ii) of Proposition 4.2, the function P (x) is twice continuously di�erentiablein (�x). and the Hessian matrix evaluated at �x is given by:r2P (�x) = Q+ 2�(�x)I + 1a2 (c�xT + �xcT ) + � 8a2�(�x) + 2"� �x�xTBy Proposition 4.4, �x is also a global minimizer of P (x) and therefore �x satis�es thesecond order necessary conditions to be a global unconstrained minimizer of P , that isr2P (�x) is positive semide�nite. Then the result follows.Recalling point (a) of Proposition 3.4, we have that in a global minimizer �x, it resultscT �x � 0. Hence, the matrix1a2 (c�xT + �xcT ) + � 8a2 ��+ 2"� �x�xTis not necessarily positive semide�nite. A similar second order necessary condition wasgiven in [1], where it has been proved, without requiring any assumptions on the matrixQ, that if the global minimum is on the boundary, the matrixr2L(�x; ��) + 1a2 (c�xT + �xcT )� 1a4 (cT �x)�x�xTis positive semide�nite where again the matrix 1a2 (c�xT + �xcT )� 1a4 (cT �x)�x�xT is notnecessarily positive semide�nite. 14

6 Algorithmic applicationBesides their own theorical interest, the results of the preceding sections are appealingalso from a computational point of view. Although the study of a numerical algorithmfor the solution of Problem (2) is out of the aim of this paper, in this section we givea hint of possible algorithmic applications of the results of Section 3 and Section 4.We recall that Proposition 3.4 ensures that given a KKT point which is not a globalsolution for Problem (2), it is possible to �nd a new feasible point with a lower value ofthe objective function and that Proposition 3.2 states that the number of KKT pointswith di�erent value of the objective function is �nite.These results indicate a new possibility to tackle large scale trust region problems.In fact they show that a global minimum point of Problem (2) could be e�cientlycomputed by applying a �nite number of times a constrained optimization algorithmthat presents the following features:(i) given a feasible starting point, it is able to locate a KKT point with a lower valueof the objective function;(ii) it presents a \good" (at least superlinearly) rate of convergence;(iii) it does not require an heavy computational burden.A possibility to ensure property (i) is to use any feasible method that forces the decreaseof the objective function, following, for example, the approach of [37, 17]. Anotherpossibility is to exploit the unconstrained reformulation of Problem (2) described inSection 4 which allows us to use any unconstrained method for the minimization of thepenalty function P . In fact, starting from a point x0, any of this algorithm obtains astationary point �x for P such that P (�x) < P (x0):Then, Proposition 4.3 ensures that �x is a KKT point of Problem (2) and that P (�x) =q(�x). On the other hand, if x0 is a feasible point, part (iv) of Proposition 4.2 yieldsthat q(�x) = P (�x) < P (x0) � q(x0):In conclusion by using an unconstrained algorithm, we get a KKT point of Problem(2) with a value of the objective function lower than the value at the starting point.Furthermore, the possibility of transforming the trust region problem into an uncon-strained one, seems to be quite appealing also as regards properties (ii) and (iii).In fact Proposition 3.6 and (iii) of Proposition 4.2 guarantees that, in the nonconvexcase, the penalty function is twice continuosly di�erentiable in every local and globalminimizer of the problem. Therefore, in this case, any unconstrained Truncated Newtonalgorithm (see for example [5, 37, 15]) can be easily adapted in order to de�ne globallyconvergent methods which show a superlinear rate of convergence in a neighbourhoodof every global or local minimizer. 15

Nevertheless, we can de�ne algorithm with superlinear rate of convergence withoutrequiring that the penalty function is twice continuosly di�erentiable in the neighbour-hood of the points of interest, that is without requiring the strict complementarity inthese points. In fact we can drawn our inspiration from the results in [8].In particular, we can de�ne a search direction dk as follows:(i) if kxkk2 � a2 � � "2�(xk) Q+ 2�(xk)I 2xk2xTk 0 ! dkzk ! = � Qxk + ckxkk2 � a2 ! (15)(ii) if kxkk2 � a2 < � "2�(xk)(Q+ 2�(xk)I)dk = � (Qxk + c) : (16)The results of [8] ensure that the algorithm xk+1 = xk + dk is locally superlinearlyconvergent without requiring the strict complementarity. Following the approach oftruncated Newton method (see for example [5, 15]), in [24] it is shown that an approxi-mate solution ~dk of (15)(16) is able to preserve the local superlinear rate of convergenceof the algorithmic scheme. Furthermore it is also proved that this direction ~dk satis�essuitable descent conditions with respect to the penalty function P . This strict connec-tion between the direction ~dk and the penalty function P (x) allow us to de�ne globallyand superlinearly convergent algorithms of the typexk+1 = xk + �k ~dk; (17)where �k can be determined by every stabilization technique and ~dk is computed byusing a conjugate gradient based iterative method for solving approximately the linearsystem (15)(16) .The paper [24] is devoted to a complete description of this approach with the anal-ysis of its theoretical properties and to the de�nition of an e�cient algorithm. Here, inorder to have only a preliminary idea of the viability of this unconstrained approach forsolving Problem (2), we have performed some numerical experiments with a rough im-plementation of algorithm (17) where �k is determined by the line-search technique of[14] and ~dk is computed by a conjugate gradient algorithm similar to the one proposedin [5].We coded the algorithm in MATLAB and run the experiments on a IBM/RISC 6000.We run two sets of problems randomly generated that we take from the collection of[34]. We solved ten related problems for each of the two classes both with the easy andthe hard case. According to [34], the hard case occurs when the vector c is orthogonalto the subspace generated by the smallest eigenvalue of the matrix Q. In Table 1 wereport the results in terms of average number of iterations for problems with increasingdimension (n = 100; 256; 324).We run also a set of near hard-case problems (with n = 100; 256; 324), that is withc nearly orthogonal to the subspace of the smallest eigenvalue of Q. The results are16

FIRST SET SECOND SETn EASY CASE HARD CASE EASY CASE HARD CASE100 11.3 21.9 10.7 25.6256 11 23 11.9 27.9324 12.9 23.6 12.9 29.9Table 1: Average number of iterationsNEAR HARD CASEmult. �min 1 5 10n = 100 8.8 10.7 9.9n = 256 9.1 6.9 9.1n = 324 11 9.4 9.2Table 2: Average number of iterationsreported in Table 2. We tested the invariance with respect to the multiplicity of thesmallest eigenvalue (mult. of �min = 1; 5; 10).The results obtained are encouraging. The number of iteration is almost constant whenthe dimension increases. This feature is appealing when solving large scale problemstaking into account that, at each iteration, the main e�ort is due to the approximatesolution of a linear system of dimension n or n � 1 that requires only matrix-vectorproducts. Furthermore the e�ciency of the algorithm seems not to be seriously a�ectedby the occurrence of the hard case, while it is completely insensible to the near-hardcase.Of course, even if no �nal conclusion can be drawn by these limited numerical exper-iments, the results obtained encourage further research in de�ning new algorithms forsolving large scale trust region problems which use the results described in this paper.In particular, as we said before, the possibility of de�ning e�cient algorithms based onthe unconstrained reformulation is investigated in [24].AcknowledgmentsWe wish to thank S. Santos, D. Sorensen, F. Rendl and H. Wolkowiz, for providing ustheir Matlab codes and test problems. Moreover we thank the anonymous referees fortheir helpful suggestions which led to improve the paper.References[1] A. Bagchi and B. Kalantari. New optimality conditions and algorithms for ho-17

mogeneous and polynomial optimization over spheres. Technical Report 40-90,RUTCOR, Rutgers University, New Brunswick, NJ 08903, 1990.[2] J.R. Bunch and B.N. Parlett. Direct methods for solving symmetric inde�nitesystems of linear equations. SIAM Journal on Numerical Analysis, 8:639{655,1971.[3] T. F. Coleman and C. Hempel. Computing a trust region step for a penaltyfunction. SIAM Journal on Scienti�c and Statistical Computing, 11:180{201, 1990.[4] J. K. Cullum and R. A. Willoughby. Lanczos Algorithms for Large SymmetricEigenvalue Computation. Birkhauser, 1985.[5] R.S. Dembo and T. Steihaug. Truncated-Newton methods algorithms for large-scale unconstrained optimization. Mathematical Programming, 26:190{212, 1983.[6] G. Di Pillo and L. Grippo. An exact penalty method with global convergenceproperties for nonlinear programmingproblems.Mathematical Programming, 36:1{18, 1986.[7] G. Di Pillo and L. Grippo. Exact penalty functions in constrained optimization.SIAM Journal on Control and Optimization, 27(6):1333{1360, 1989.[8] F. Facchinei and S. Lucidi. Quadratically and superlinear convergent algorithmsfor the solution of inequality constrained optimization problems. Journal of Opti-mization Theory and Application, 85(2), 1985.[9] R. E. Fletcher. A class of methods for nonlinear programmingwith termination andconvergence properties. In J. Abadie, editor, Integer and Nonlinear Programming,pages 157{173, Amsterdam, 1979. North-Holland.[10] A. Forsgren, P.E. Gill, and W. Murray. Computing modi�ed Newton directionsusing a partial Cholesky factorization. SIAM Journal of Scienti�c Computing,16:139{150, 1995.[11] G. E. Forsythe and G. H. Golub. On the stationary values of a second-degreepolynomial on the unit sphere. J. Soc. Indust. Appl. Math., 13(4):1050{1068,1965.[12] D. M. Gay. Computing optimal locally constrained steps. SIAM Journal onScienti�c and Statistical Computing, 2(2):186{197, 1981.[13] T. Glad and E. Polak. A multiplier method with automatic limitation of penaltygrowth. Mathematical Programming, 17:140{155, 1979.[14] L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line search techniquefor Newton's method. SIAM Journal on Numerical Analysis, 23:707{716, 1986.18

[15] L. Grippo, F. Lampariello, and S. Lucidi. A truncated Newton method withnonmonotone linesearch for unconstrained optimization. Journal of OptimizationTheory and Applications, 60:401{419, 1989.[16] L. Grippo and S. Lucidi. A di�erentiable exact penalty function for bound con-strained quadratic programming problems. Optimization, 22:557{578, 1991.[17] M. Heinkenschloss. On the solution of a two ball trust region subproblem. Math-ematical Programming, 64:249{276, 1994.[18] M.R. Hestenes. Multiplier and gradient methods. Journal of Optimization Theoryand Application, 4:303{320, 1969.[19] A. Kamath and N. Karmarkar. A continuous approach to compute upper boundsin quadratic maximization problems with integer constraints. In C. A. Floudas andP. M. Pardalos, editors, Recent Advances in Global Optimization, pages 125{140,Princeton University, 1991. Princeton University Press.[20] S. Kapoor and P. Vaidya. Fast algorithms for convex quadratic programming andmulticommodity ows. In Proc. 18th Annual ACM Symp. Theory Comput., pages147{159, 1986.[21] N. Karmarkar. An interior-point approach to NP-complete problems. In Proceed-ings of the Mathematical programming society Conference on Integer Programmingand Combinatorial Optimization, pages 351{366, 1990.[22] N. Karmarkar, M. G. C. Resende, and K.G. Ramakrishnan. An interior pointalgorithm to solve computationally di�cult set covering problems. MathematicalProgramming, 52:597{618, 1991.[23] W. Li. A di�erentiable piecewise quadratic exact penalty functions for quadraticprograms with simple bound constraints. Technical report, Department of Math-ematics and Statistics, Old Dominion University, Norfolk, VA 23529, 1994.[24] S. Lucidi and L. Palagi. A class of algorithm for large scale trust region subprob-lems. Technical Report Preliminary Draft 1996, Dipartimento di Informatica eSistemistica, Universit�a di Roma \La Sapienza", Roma, Italy, 1996.[25] S. Lucidi, L. Palagi, and M. Roma. Quadratic programs with quadratic constraint:characterization of KKT points and equivalence with an unconstrained problem.Technical Report 24.94, Dipartimento di Informatica e Sistemistica, Universit�a diRoma \La Sapienza", Roma, Italy, 1994.[26] J. M. Mart��nez. Local minimizers of quadratic functions on Euclidean balls andspheres. SIAM Journal on Optimization, 4(1):159{176, 1994.[27] J. M. Mart��nez and S. A. Santos. Trust region algorithms on arbitrary domains.Mathematical Programming, 68:267{302, 1995.19

[28] J. J. Mor�e. Generalization of the trust region problem. Optimization Methods andSoftware, 2:189{209, 1993.[29] J. J. Mor�e and D. C. Sorensen. Computing a trust region step. SIAM Journal onScienti�c and Statistical Computing, 4(3):553{572, 1983.[30] J.J. Mor�e and D.C. Sorensen. On the use of directions of negative curvature in amodi�ed Newton method. Mathematical Programming, 16:1{20, 1979.[31] P. M. Pardalos, Y. Ye, and C.-G. Han. Algorithms for the solution of quadraticknapsack problems. Linear Algebra Appl., 25:69{91, 1991.[32] M. J. D. Powell. A method for nonlinear constraints in minimization problem.In R. Fletcher, editor, Optimization, pages 283{298. Academic Press, New York,1969.[33] F. Rendl and H. Wolkowicz. A semide�nite framework to trust region subproblemswith applications to large scale minimization. Technical Report CORR Rep.94-32,University of Waterloo, Dept. of Combinatorics and Optimization, 1994.[34] S. A. Santos and D. C. Sorensen. A new matrix-free for the large scale trust regionsubproblem. Technical Report TR95-20, Rice University, Houston, TX, 1995.[35] D. C. Sorensen. Newton's method with a model trust region modi�cation. SIAMJournal on Scienti�c and Statistical Computing, 19(2):409{427, 1982.[36] D. C. Sorensen. Minimization of a large scale quadratic function subject to anellipsoidal constraint. Technical Report TR94-27, Rice University, Houston, TX,1994.[37] Ph.L. Toint. Towards an e�cient sparsity exploiting Newton method for minimiza-tion. In I. S. Du�, editor, Sparse Matrices and Their Uses, pages 57{88, London,1981. Academic Press.[38] S. A. Vavasis. Nonlinear Optimization. Oxford University Press, 1991.[39] S. A. Vavasis and R. Zippel. Proving polynomial-time for sphere-constrainedquadratic programming. Technical Report 90-1182, Department of Computer Sci-ence, Cornell University, Ithaca, New York, 1990.[40] Y. Ye. A new complexity result on minimization of a quadratic function with asphere constraint. In C. A. Floudas and P. M. Pardalos, editors, Recent Advancesin Global Optimization, pages 19 { 31, Princeton University, 1991. Princeton Uni-versity Press.[41] Y. Ye. On a�ne scaling algorithms for nonconvex quadratic programming. Math-ematical Programming, 56:285{300, 1992.[42] Y. Ye and E. Tse. An extension of Karmarkar's projective algorithm for convexquadratic programming. Mathematical Programming, 44:157{179, 1989.20

AppendixBefore giving the proof of Proposition 4.3, we report this well known result whose proofis immediate.Lemma A.1 max�k�xk2 � a2;�"2�(�x)� = 0if and only if k�xk2 � a2; �(�x) � 0; �(�x)(k�xk2 � a2) = 0:Proof of Proposition 4.3 First, we note that we can rewrite the gradient of P asfollows rP (x) = rq(x) + 2�(x)x+r�(x)max�kxk2 � a2;�"2�(x)�+4"xmax�kxk2 � a2;�"2�(x)�: (18)If part. Assume that (�x; ��) is a KKT pair for Problem (2) then by recalling Lemma A.1,the result easily follows from the expression (18) of the gradient rP (x).Only if part. In order to prove this part, it is enough to show thatmax�k�xk2 � a2;�"2�(�x)� = 0: (19)In fact, from the expression (18) of the gradient of P we have immediately that0 = rP (�x) = rq(�x) + 2�(�x)�x;which together with Lemma A.1 implies that the point (�x; �(�x)) is a KKT point ofProblem (2). We turn to prove (19). We distinguish the cases �x = 0 and �x 6= 0. Thepoint �x = 0 is a stationary point for P if and only if c = 0. On the other hand, thepoint �x = 0 is a KKT point for Problem (2) if and only if c = 0.Now, we assume �x 6= 0. By (18) we can write for every x 6= 0"xTrP (x) = "xTrq(x) + 2�(x) kxk2 + 4 kxk2maxnkxk2 � a2;� "2�(x)o+"xTr�(x)maxnkxk2 � a2;� "2�(x)o= 2"�(x) �kxk2 � a2�+ �"xTr�(x) + 4 kxk2�maxnkxk2 � a2;� "2�(x)oHence, taking into account that2"�(x) �kxk2 � a2� = (2"�(x)� 4(kxk2 � a2))maxnkxk2 � a2;� "2�(x)o+4maxnkxk2 � a2;� "2�(x)o2 ;21

it easily follows that"xTrP (x) = max�kxk2 � a2;�"2�(x)�M(x; ") (20)where M(x; ") = "xTr�(x) + 2"�(x) + 4a2+4max�kxk2 � a2;�"2�(x)� : (21)Now, our aim is to show that M(x; ") is strictly positive for every x 2 IRn.First, we consider the case maxnkxk2 � a2;� "2�(x)o = kxk2 � a2 that iskxk2 � a2 � "4a2 �xTQx+ cTx� :By simple calculation we get the inequalitykxk2 � 8a4 � "kck28a2 + "(2kQk+ 1) : (22)In this case we haveM(x; ") = 4 kxk2 + "xTr�(x) + 2"�(x) = 4 kxk2 � "2a2 �4xTQx+ 3cTx�� 4 kxk2 � "2a2 �4 kQk kxk2 + 32 �kxk2 + kck2��= 14a2 hkxk2 �16a2 � "(8 kQk+ 3)�� 3 kck2 "iSince " satis�es (13), the term 16a2 � "(8 kQk + 3) is positive, and by using (22), wecan write the following inequality:M(�x; ") � "2 kck2 kQk � 4a2p1"+ 64a62a2 (8a2 + " (2 kQk+ 1)) ; (23)where p1 = a2(8 kQk+ 3) + 5 kck2 : (24)The numerator of the right term of (23) is a quadratic function in " which assumepositive values in the interval (0; "1) where"1 = b �p1 � �p21 � q� 12 �with q = 16a2 kck2 kQk ; b = 2a2kck2 kQk : (25)Now, we note the following relationships"1 = bqp1 + �p21 � q� 12 � bq2p1 = 16a4a2 (8kQk+ 3) + 5kck2 ;22

which, by the choice (13) for the parameter ", imply that " 2 (0; "1). Therefore, for allx such that maxnkxk2 � a2;� "2�(x)o = kxk2 � a2 and for all " satisfying (13), we getM(x; ") > 0:Now, let us consider the case maxnkxk2 � a2;� "2�(x)o = � "2�(x), that iskxk2 � a2 � "4a2 �xTQx+ cTx� :By simple calculation, and by the fact that " satis�es (13), we have that 8a2 �" (2 kQk+ 1) is positive and hence we have the inequalitykxk2 � " kck2 + 8a48a2 � " (2 kQk+ 1) : (26)In this case we have thatM(x; ") = "xTr�(x) + 4a2 = � "2a2 �2xTQx+ cTx�+ 4a2� � "4a2 h(4 kQk+ 1) kxk2 + kck2i+ 4a2� �kck2 kQk "2 � 4a2p2"+ 64a62a2 (8a2 + " (2 kQk+ 1)) (27)where p2 = a2 (8 kQk+ 3) + kck2 :The numerator of the last term of (27) is a quadratic function in " which assume positivevalues in the interval (0; "2) where"2 = b ��p2 + �p22 + q� 12 �and q; b are already de�ned by (25). Let us observe that p1; p2 > 0, p1 = p2 + 4 kck2where p1 is given by (24), and it is easily seen that p22 + q � p21, hence the followinginequality holds �p2 + �p22 + q�12 = qp2 + �p22 + q� 12 � q2p1 :Now, the choice (13) for the parameter ", imply that " 2 (0; "2) and hence for every xsuch that maxnkxk2 � a2;� "2�(x)o = � "2�(x) we haveM(x; ") > 0:Therefore, if �x is a point such that rP (�x) = 0, recalling (20) we necessarily havethat maxnk�xk2 � a2;� "2�(�x)o = 0; that is equivalent to the conditions (19). Hence,we have proved the �rst part of the proposition.23

Now, in order to complete the proof we have to show that P (�x) = q(�x). Thiseasily follows from Lemma A.1, Proposition 4.1 (ii) and by observing that the penaltyfunction can be rewritten asP (x) = q(x) + �(x)max�kxk2 � a2;�"2�(x)�+ 1" max�kxk2 � a2;�"2�(x)�2:

24

On Some Properties of Quadratic Programs With a Convex Quadratic Constraint

Documents

Transcript of On Some Properties of Quadratic Programs With a Convex Quadratic Constraint