Newton's method for quadratic stochastic programs with recourse

Newton's Method for Quadratic Stochastic Programswith Recourse 1Xiaojun Chen, Liqun Qi and Robert S. WomersleySchool of MathematicsUniversity of New South WalesP.O. Box 1, Kensington NSW 2033, Australia(March 1994)Abstract. Quadratic stochastic programs (QSP) with recourse can be for-mulated as nonlinear convex programming problems. By attaching a La-grange multiplier vector to the nonlinear convex program, a QSP is writtenas a system of nonsmooth equations. A Newton-like method for solving theQSP is proposed and global convergence and local superlinear convergenceof the method are established. The current method is more general thanprevious methods which were developed for box-diagonal and fully quadraticQSP. Numerical experiments are given to demonstrate the e�ciency of thealgorithm, and to compare the use of Monte-Carlo rules and lattice rules formultiple integration in the algorithm.Keywords: Newton's method, quadratic stochastic programs, nonsmoothequations.Short title: Newton's method for stochastic programs1This work is supported by the Australian Research Council.1

1. IntroductionLet P 2 Rn�n be symmetric positive semi-de�nite and H 2 Rm�m besymmetric positive de�nite. We consider two-stage quadratic stochastic pro-grams with �xed recourse [19,20]min 12xTPx+ cTx+ �(x)x 2 Rnsubject to Ax � b (1:1)where �(x) = Z (x; !)�(!)d!and (x; !) = max �12zTHz + zT (h(!)� Tx)z 2 Rmsubject to Wz � q:Here c 2 Rn; A 2 Rr�n; b 2 Rr; T 2 Rm�n; q 2 Rm1 and W 2 Rm1�m are�xed matrices, ! 2 Rm2 is a random vector with support � Rm2 , � is aprobability density function on Rm2 and h(�) 2 Rm is a random vector.By introducing a new variable y, an equivalent form of (1.1) ismin 12xTPx+ cTx+ �(y)x 2 Rn; y 2 Rmsubject to Ax � b (1:2)Tx� y = 0;where �(y) = Z g(y; !)�(!)d!g(y; !) = max �12zTHz + zT (h(!)� y)z 2 Rmsubject to Wz � q:Since H is symmetric positive de�nite, the function � is convex and oncecontinuously di�erentiable. Calculating � involves multi-dimensional inte-grals and quadratic programs. Problem (1.2) is useful because it is a convexprogram in which computational di�culties occur primarily in evaluation of� (m variables) and usually m� n. See [7, 8].2

Since it is impossible to demand the exact evaluation of the function �and its gradient, we consider approximate problems of the formmin 12xTPx+ cTx+ f(y)x 2 Rn; y 2 Rmsubject to Ax � b (1:3)Tx� y = 0where f(y) = NXi=1 �ig(y; !i)��(!i);g(y; !i) = max �12zTHz + zT (h(!i)� y)z 2 Rmsubject to Wz � q:The function �� involves � and a transformation used to go from the integralon to the unit cube [0; 1]m2 . The weights f�igNi=1 and points f!igNi=1 aregenerated by a multidimensional numerical integration rule. The �i; !i areindependent of y. In Section 4 we discuss both Monte-Carlo methods andlattice methods for approximating �. The aim is to develop methods whichare applicable to problems where the dimension m2 of the integral is large(� 5). Lattice methods [21, 4] are promising methods for multi-dimensionalintegration, and to our knowledge have not been used in stochastic program-ming before. In both Monte-Carlo and lattice methods equal weights arechosen, that is �i = 1N ; i = 1; 2; :::; N:Let X = fx 2 Rn j Ax � bgand Z = fz 2 Rm j Wz � qgbe nonempty polyhedra.Since H is symmetric positive de�nite, f is a di�erentiable convex func-tion de�ned in the whole space Rm.Problem (1.3) can be considered as an Extended Linear Quadratic Pro-gramming (ELQP) problem as introduced by Rockafellar and Wets [19, 20].If both P and H are positive de�nite, the problem is called fully quadratic.If both P and H are diagonal, and both X and Z are box regions de�nedby simple lower and upper bounds on the variables, the problem is called3

box-diagonal. Several numerical methods have been developed for solvingfully quadratic and box diagonal ELQP problems [14, 18-20, 23, 24]. How-ever most of these methods are less e�cient in the general case. Even whenthe problems are fully quadratic only linear convergence rates were estab-lished. Recently, Qi and Womersley [15] presented a sequential quadraticprogramming method for box-diagonal case and showed that the rate of con-vergence of their method is superlinear. Althrough the algorithm in [15] isnot in principal restricted to the box-diagonal case, it used explicit expressionfor derivative information which are only available in the box-diagonal case.When Z is a box, the problem corresponds to the simple recourse problem,which is relatively easy. The general case is typically very hard [5,12].In [15, 18-20, 24] the dual objective is evaluated to provide a dualitygap for a stopping criterion. The current algorithm does not evaluate thedual objective as this would involve the solution of a �rst stage quadraticprogramming problem with a potentially large number of variables.In this paper we present a new method which is e�cient in general caseand realizes both global convergence and superlinear convergence. Further-more, we do not need to calculate the dual problem.Assume that there exists an optimal solution (x�; y�) of (1.3). Accordingto Theorem 28.2 in [17], there exist optimal Lagrange multiplier vectors �� 2Rr and p� 2 Rm associated with the constraints Ax � b and Tx � y = 0,respectively. From the Kuhn-Tucker conditions for (1.3), we then have0 = Px� + c+ AT�� + T Tp�0 = 5f(y�)� p�0 = Tx� � y�Ax� � b; �� 0; ��T (Ax� � b) = 0:Let � = (x; y; �; p) and M = n+ 2m+ r. Then the Kuhn-Tucker conditionsfor (1.3) can be stated as the following system of nonsmooth equations inthe variable � [10]:F (�) � 0BBB@ Px+ c+ AT�+ T Tp5f(y)� pmin(b� Ax; �)Tx� y 1CCCA = 0; (1:4)where \min" denotes the componentwise minimum operator on a pair ofvectors. The nonsmooth function F is a mapping from RM into itself.4

If �� is a solution of (1.4), then the Kuhn-Tucker conditions for (1.3)hold at (x�; y�). Since (1.3) is a convex program and the objective and theconstraints of (1.3) are C1 functions, by Theorem 9.4.2 in [3], (x�; y�) is aglobal solution to (1.3).In Section 2, we show that F is globally Lipschitz inRM . By Rademacher'stheorem, F is di�erentiable almost everywhere in RM . Let DF be the setwhere F is di�erentiable. We use the generalized Jacobian de�ned in [13]@BF (�) = f lim�k!��k2DF 5F (�k)g: (1:5)We give a V 2 @BF (�) for � 2 RM by using Pang's results on the projectionfunction [10].In Section 3, we use the generalized Jacobian V 2 @BF (�) given in Section2 to present an algorithm. The algorithm realizes global convergence andsuperlinear local convergence.Let jj � jj denote the Euclidean norm.When we get a solution of (1.3), two basic questions are raised: how goodof an estimate is the optimal value of (1.3) for the optimal value of (1.2), andhow well do the solutions of (1.3) approximate the solutions of (1.2) [6]. InSection 4, while solving (1.3) by our algorithm, we consider which integrationrule can provide a sharper estimate for j�(y)� f(y)j and minimize the num-ber of integrand evaluations. We give numerical experiments to demonstratethe e�ciency of our algorithm and to compare the use of Monte-Carlo rulesand lattice rules for multiple integration in the algorithm.2. Generalized JacobiansIn this section we show that rf is the sum of projection functions, soF is globally Lipschitz on RM . The Lipschitz property of F can be viewedas a consequence of the results [11]. However the following results empha-size the special structure in terms of projection functions which is used inthe Theorem 2.1 and the algorithm. Furthermore we give an element in thegeneralized Jacobian @BF (�) for � 2 RM .Proposition 2.1. Let !i 2 Rm2 be �xed, i = 1; 2; :::; N ,Qi(y) = �argmaxz f�12zTHz + (h(!i)� y)Tz : z 2 Zg5

and Q(y) = 1N NXi=1Qi(y)��(!i):Then Q(y) = 5f(y).Proof. Since H is positive de�nite, for any y 2 Rm; !i 2 Rm2 there exists aunique z�(y; !i) such that z�(y; !i) =argmaxf�12zTHz+(h(!i)� y)Tz : z 2Zg. By Theorem 27.1 in [17] on convex conjugate functions,g(y; !i) = maxz f�12zTHz + (h(!i)� y)Tz : z 2 Zgis di�erentiable at y and @g(y; !i)@y = �z�(y; !i):Hence we have5f(y) = 1N NXi=1 @g@y (y; !i)��(!i) = � 1N NXi=1 z�(y; !i)��(!i)= 1N NXi=1Qi(y)��(!i) = Q(y): 2Let '(x; �) =min(b� Ax; �). Using the construction in [13], we can givean element in the generalized Jacobian @B'(x; �): De�ne~A(x; �) = 0B@ ~a1(x; �):::~ar(x; �) 1CA 2 Rr�n and �(x; �) = 0B@ �1(x; �):::�r(x; �) 1CA 2 Rr�rwhere ~ai(x; �) = ( �ai if (b� Ax)i � �i0 otherwiseand �i(x; �) = ( ei if �i � (b� Ax)i0 otherwise6

for i = 1; 2; :::; r:Here ai is the i-th row of A and ei is the i-th row of the identity matrixI 2 Rr�r: Then ( ~A(x; �);�(x; �)) 2 @B'(x; �):De�nition 2.1. Let S be a nonempty, closed and convex subset of Rm: Theprojection of u 2 Rm onto the set S, denoted as �S(u) is the unique solutionmins2S jju� sjj:Let S be the polyhedronS = fs 2 Rm j Bs � qg;where B 2 Rm1�m. For any arbitrary vector u let �Bu denote the submatrixof B comprising of rows that correspond to the active inequalities Bs � q atthe vector �S(u). De�ne the polyhedral coneS(u) = fs j (�S(u)� u)T s = 0; �Bus � 0gand the lineality space of S(u)L(u) = S(u) \ (�S(u)):We summarize properties of the projection function which are used inthis paper.Lemma 2.1.(i) [16] �S is a contraction, i.e., it is Lipschitz with modulus 1.(ii) [10] �S is everywhere directionally di�erentiable along any direction and�0S(u; d) = �S(u)(d): Furthermore, for any vector h with jjhjj su�cientlysmall �S(u+ h) = �S(u) + �S(u)(h):(iii) [10] �S is F -di�erentiable at u if and only if �Bu�S(u) is identically zero;in this case, 5�S(u) = �L(u); (here �Bu�S(u) denotes the composite map:( �Bu�S(u))(d) = �Bu(�S(u)(d))):By Lemma 2.1 and Rademacher's theorem, �S is di�erentiable almosteverywhere. Let D�S be the set where �S is di�erentiable.7

Theorem 2.1. Let B = WH� 12 . ThenS = fs 2 Rm j Bs � qg = fs j s = H 12 z; z 2 Zgandi) Qi is di�erentiable almost everywhere;ii) Qi is F -di�erentiable at y if and only if �Bu�S(u) is identically equal tozero, where u = H� 12 (h(!i)� y);iii) @BQi(y) = H� 12 limuk!uuk2D�S f5�S(uk)gH� 12 = H� 12 limuk!uuk2D�S f�L(uk)gH� 12 ;where uk = H� 12 (h(!i)� yk):iv) Assume that W has full row rank. LetUi(y) = ( H�1 if H� 12ui 2 int ZH� 12�L(ui)H� 12 otherwise,where ui = H� 12 (h(!i) � y): Then Ui(y) 2 @BQi(y). Furthermore, F (�) isglobally Lipschitz in RM andV� = 0BBB@ P 0 AT T T0 U(y) 0 �I~A(x; �) 0 �(x; �) 0T �I 0 0 1CCCA 2 @BF (�); (2:1)where U(y) = 1N NXi=1Ui(y)��(!i) 2 @BQ(y): (2:2)Proof. Let �y be a vector inRm and let v = h(!i)��y. Then �z =argmaxf�12zTHz+zT v; z 2 Zg if and only if for any z 2 Z,(z � �z)T (v �H�z) � 0:Let s = H 12 z: Then �s = H 12 �z if and only if for any s 2 S,(s� �s)T (H� 12 v � �s) � 0:8

Let u = H� 12v. By the De�nition 2.1, �s is the projection of u onto S, i.e.�s = �S(u). Since H 12 �z = �s = �S(u) = �S(H� 12 v); we haveQi(�y) = ��z = �H� 12�S(H� 12 (h(!i)� �y)):Furthermore, if u 2 D�S ; then Qi is di�erentiable at �y andrQi(�y) = H� 12r�S(H� 12 (h(!i)� �y))H� 12 = H� 12�L(u)H� 12 :Hence (i)-(iii) follow from Lemma 2.1.iv) Since rf = Q is the sum of projection functions, rf is globally Lipschitzin Rm. Since '(x; �) is the componentwise minimum operator on a pair oflinear functions, ' is globally Lipschitz in Rn � Rr. Therefore F is globallyLipschitz in RM . By Rademacher's theorem, F is di�erentiable almost ev-erywhere in RM . Hence we can de�ne the generalized Jacobian @BF (�) forany � 2 RM . Now we prove (2.1).If �z = H�1v 2 int Z 6= ;, then u = H� 12v 2 int S and �Bu does not exist.Hence Qi is di�erentiable at �y and L(u) = Rm: This implies r�S(u) = Iand rQi(�y) = H�1.Now we consider the case H�1v =2 intZ. For any u 2 Rm, �L(u) is theprojection from Rm to the null space L(u) of the matrix (�S(u)� u)T�Bu ! :It is su�cient to prove that for any u 2 Rm;�L(u) 2 f limuk!uuk2D�S �L(uk)g;that is, there is a subsequence fukg � D�S such that�L(u) = limk!1�L(uk): (2:3)Let J0 = fi1; i2; :::; ij0g be the index set such that (Bu)ij = qij ; j = 1; 2; :::; j0:Since B has full row rank, there is an n�m matrix E such that BE(BE)T =I. Without loss of generality we may assume that BBT = I:Consider the case j0 = 0. In this case, there exists a neighborhood Nuof u such that for any u + h 2 Nu; �Bu = �Bu+h. Let �s 2 S(u). Then thereis a small positive number � such that �h = ��s 2 S(u) and u + �h 2 Nu.9

Furthermore, for any s 2 S(u),(�S(u+ �h)� (u+ �h))T s= (�S(u) + �S(u)(�h)� (u+ �h))T s= (�S(u)(�h)� �h)T s = (�h� �h)T s = 0:On the other hand, for any s 2 S(u+ �h),(�S(u)� u)T s= (�S(u+ �h)� �S(u)(�h)� (u+ �h� �h))T s= (��S(u)(�h) + �h)T s = (��h+ �h)T s = 0:Hence we have S(u) = S(u+ �h). By Lemma 2.1, we have�S(u+ �h) = �S(u) + �S(u)(�h)and �S(u) = �S(u+ �h) + �S(u+�h)(��h):Hence �h = �S(u)(�h) = ��S(u+�h)(��h) = ��S(u)(��h):Thus, ��h = ��s 2 S(u). Therefore, we must have �Bu�s = 0. Since �s is anyelement in S(u), we have �Bu�S(u) � 0: By Lemma 2.1, �S is di�erentiableat u, so that 5�S(u) = �L(u):Now we consider the case j0 � 1: Take a sequence fukg satisfyinguk = argmin� fjj� � ujj : (B�)i = (qk)i; i 2 J0g;where (qk)i = ( qi if i =2 J0qi + �k if i 2 J0,and �k > 0. The assumption that W has full row rank ensures that the setf� j (B�)i = (qk)i; i 2 J0g is not empty.Clearly �Buk = �Bu and (Buk)i 6= qi; i = 1; :::; m1. Hence fukg � D�Sand uk ! u as �k ! 0. Furthermore, since �S is globally Lipschitz,limuk!u�S(uk) = �S(u), we have (2.3) and H� 12�L(u)H� 12 2 @BQi(�y):Now we prove (2.2). Since Qi; i = 1; 2; :::; N are piecewise smooth, Qialmost everywhere di�erentiable. If all but at most one of the Qi are dif-ferentiable at y; then (2.2) holds. Hence it su�ces to prove that there two10

Qi1 and Qi2 are nondi�erentiable at y, the general case follows by induction.Let u1 = H� 12 (h(!i1) � y) and u2 = H� 12 (h(!i2) � y). Let ~B1 2 Rj1�m and~B2 2 Rj2�m denote the submatrices of B comprising of rows that Bu1 = qand Bu2 = q, respectively. Since B has full row rank, any row of ~B1~B2 ! islinearly independent or equal to others. Hence for any �k > 0 there is a 4uksuch that ~B1~B2 !4uk = �k~e; ~e = (1; 1; :::; 1) 2 Rj1+j2. Hence we may choosea sequence f4ukg such that 4uk ! 0 as k ! 1 and (B(u1 +4uk))i 6= qiand (B(u2 +4uk))i 6= qi; i = 1; :::; m1. It implies�L(u1+4uk) +�L(u2+4uk) = r�S(u1 +4uk) +r�S(u2 +4uk)= r(�S(u1 +4uk) + �S(u2 +4uk))and H� 12�L(u1)H� 12 +H� 12�L(u2)H� 12 2 @B(Qi1(y) +Qi2(y)): 2Remark 2.1. Let ~Bu = (�S(u)� u)T�Bu ! :Then �L(u) is a projection from Rm to the null space N ( ~Bu) of ~Bu: We cangive �L(u) by QR decomposition. Let ~Bu be an l�m matrix with rank( ~Bu) =r > 0. Let ~BTu = QR be a QR decomposition of ~BTu , where Q = (Q1; Q2)is an orthogonal matrix of order m � m (Q1 2 Rm�r; Q2 2 Rm�(m�r)) andR = ( �RT ; 0)T is an upper triangular matrix of order m� l ( �R 2 Rr�l). Thenwe have �L(u) = Q2QT2 :We can also obtain the projection from the singular value decomposition.De�nition 2.2 [13]. Let : Rm ! Rm be a locally Lipschitzian mapping.We say that is semismooth at y iflimU2@B(y+td0)d0!d;t#0 fUd0gexists for any d 2 Rm. 11

Proposition 2.2. IfH�1(h(!i)�y) =2 Z, then at least one matrix in @BQi(y)is singular.Proof. By Lemma 2.1, Qi is everywhere directionally di�erentiable alongany direction and �0S(u; d) = �S(u)d. Let u = H� 12 (h(!i)� y): Then u =2 S:Obviously, 0 2 S(u). Let d = �S(u)� u. Then d 6= 0 and sTd = 0 for anys 2 S(u). Hence �0S(u; d) = �S(u)d = 0:Since �S is piecewise smooth, �S is semismooth. By Lemma 2.1 in [13],there exists a U 2 @B�S(u) such that �0S(u; d) = Ud = 0. This implies thatU is singular and H� 12UH� 12 2 @BQi(y) is singular. 23. Algorithm and ConvergenceUsing the generalized Jacobian given in Theorem 2.1, we can solve theproblem (1.3) via nonsmooth equations (1.4) by some two-stage methodswhich realize global convergence and superlinear local convergence [13]. Inthis paper, we consider Newton's method (the sequential quadratic program-ming method) d (cf. [11, 14, 15]).Let u = (x; y)T and d = (dx; dy)T . Denote the objective function in (1.3)by �(u) = 12xTPx+ cTx+ f(y)and a quadratic function which is an approximation to �(u+ d) by�k(uk + d) = 5�(uk)Td+ 12dTx (P + �kI)dx + 12dTy U(yk)dy;where �k is a scalar.Algorithm 3.1. Choose �; � 2 (0; 1); � > 0; x0 2 X. Let y0 = Tx0. Fork � 0: Let �k > 0.1. Solve a quadratic program:minimize �k(uk + d)dsubject to A(xk + dx) � bTdx � dy = 0: (3:1)12

Let dk be the unique optimal solution of (3.1).2. Let �k and pk be the Lagrange multipliers at the solution of (3.1) cor-responding to A(xk + dx) � b and Tdx = dy; respectively. Let ~�k =(uk + dk; �k; pk)T : Calculate F (~�k). If jjF (~�k)jj � �, stop; otherwise go tostep 3.3. Let ik be the minimum integer i � 0 such that�(uk + �idk) � �(uk) + �2 �i5 �(uk)Tdk:Let uk+1 = uk + �ikdk:To consider the uniqueness of the solution of (1.3) and the superlinearconvergence rate of Algorithm 3.1, we use the de�nition of B-regularity from[11]. Let �̂(x) = �(x; Tx) = 12xTPx + cTx + f(Tx): Then the function�̂ is B-regular at x� if for any V (Tx�) 2 @BQ(Tx�); P + T TV (Tx�)T ispositive de�nite. Clearly, if P is positive de�nite, so the problem (1.3) isfully quadratic, then �̂ is B-regular at any point x 2 Rn.The following convergence theorem is based on papers [11, 14, 15]. Pa-per [14] proved the local superlinear convergence of an approximate Newtonmethod (cf. Algorithm 3.1 without step 3 and setting ik = 0) for solvingLC1 optimization problem. Paper [15] was an application of [14] to ELQPproblem. Paper [11] was a globalization of such a Newton method for LC1minimization problem.To consider the convergence of Algorithm 3.1, we set � = 0 in that algo-rithm.Theorem 3.1. (i) (Globally convergence) Let � and c0 be positive scalars.Suppose that �k 2 [0; �]; such that the smallest eigenvalue of P+T TU(yk)T+�kI is greater than c0 for all large k. Then every accumulation point of thesequence fukg produced by Algorithm 3.1, if it exists, is an optimal solutionof (1.3).(ii) (Local superlinear convergence) Suppose that the sequence fukg producedby Algorithm 3.1 has an accumulation point u�, U(yk) 2 @BQ(yk) and �̂ isB-regular at x�, then(a) u� is the unique optimal solution of (1.3).13

(b) Let f�kg be a sequence of positive scalars converging to zero. Thenthere exists an integer k0 such that for all k � k0; ik = 0; and the sequencefukg converges to u� at least Q-superlinearly, i.e.limk!1 jjuk+1 � u�jjjjuk � u�jj = 0: (3:2)Proof. By Theorem 2.1, function � is Fr�echet-di�erentiable in Rn+m and thegradient function r� is globally Lipschitz in Rn+m. Hence (1.3) is an LC1minimization problem. The quadratic program (3.1) is equivalent tominimize (Pxk + c+ T Trf(y))Tdx + 12dTx (P + �kI + T TU(yk)T )dxdxsubject to A(xk + dx) � b:Hence dk = (dxk ; T dxk)T is the unique optimal solution of (3.1). Furthermorefor any dx 2 Rn, dTx (P + �kI + T TU(yk)T )dx � c0jjdxjj2:By Theorem 2 in [11], (i) holds.Let (u) = 5�(u): By Theorem 2.1 and Lemma 2.1,0(u; d) = (Pdx; 1N NXi=1H� 12�S(ui)H� 12dy);where ui = H� 12 (h(!i)� y): Hence 0(u; �) is Lipschitzian. By Lemma 2.1 in[13], for any u 2 Rn+m, there is a V 2 @B(u) such that0(u; d) = V d:Hence, we can show that u� is the unique solution of (1.3) by the technicaldetail in Theorem 2.2 in [15].Since the projection function �S is piecewise smooth, �S is semismooth.Hence 5� is semismooth. By Theorem 3 in [11], we have (b). 2Remark 3.1. Since Q is piecewise smooth, U(y) 2 @BQ(y) holds almosteverywhere in Rm. If W has full row rank then U(y) 2 @BQ(y) holds.14

However in general, even if Z is a box, can have a degenerate case whereU(y) =2 @BQ(y).4. Numerical ExperimentsIn this section, we give three examples. The �rst example is chosen sothat the points where g has nonsmooth �rst derivatives can be analyticallydetermined. The second example is to demonstrate the e�ciency of Algo-rithm 3.1 for problem (1.3). The third example is given to test Algorithm 3.1with the use of Monte-Carlo methods and the lattice methods. The numer-ical experiments were obtained by using Matlab on a DEC 5000 work station.Example 1.Consider the problem (1.1) in which X = R2; H = I 2 R2�2; T = �H;P = 2 00 0 ! ; W = 1 12 �1 ! ; q = 33 ! :The point ~x� = (3; 1)T is made the optimal solution of (1.3) by taking ~y� =T ~x� andc = �P ~x� � 1N T T NXi=1rg(~y�; !i)��(!i) = �9:02902247175663�0:98984189153187 ! ;where the probability density is the normal density, h(!) = ! and pointsf!igNi=1 are generated by the lattice rule in R2 with N = 80; 044.Let v = ! � Tx: The value of the functiong(v) = maxz f�12zTHz + vT z; z 2 Zgis attained at the pointz�(v) = 8>>>><>>>>: v if v 2 R1 = fv j Wv � qg(2; 1) if v 2 R2 = fv j �Wv � �qgv � c2jjw2jj2w2 if v 2 R3 = fv j (Wv)2 � q2 & ( �Wv)2 � �q2gv � c1jjw1jj2w1 if v 2 R4 = fv j (Wv)1 � q1 & ( �Wv)1 � �q1; gwhere c = Wv � q; w1 and w2 are the rows of W and�W = 1 �11 2 ! and �q = 14 ! :15

Figure 1 plots g(v) and the two components of 5g(v) for v 2 D =~x� + [�4; 4]� [�4; 4] with N = 4952.-2

0

2

4

0 5

g(v)

*

-2

0

2

4

0 5

+-1

+-0.5

+0

+0.5

+1+1.5

First component z*(v)

*

-2

0

2

4

0 5

+-1.5+-1+-0.5

+0

+0.5

+1.5

+2

Second component z*(v)

*

g(v)Clearly, g(v) is once continuously di�erentiable, but it has no secondderivative. We choose 4 starting points in 4 regions: (�1;�1) 2 R1; (5; 2) 2R2; (3; 4) 2 R3 and (2;�2) 2 R4 and test Algorithm 3.1 for solving 4 problems(1.3) with 4 approximate solutions ~x�: (3; 1) 2 R2; (3; 2) 2 R2 \ R3; (1; 2) 2R1 \ R3 and (2; 1) 2 R1 \ R2 \ R3 \ R4. The numerical results with theconvergence criterion jjF (�k)jj � 5� 10�7 are shown in Table 1.Example 2. Let n = 20; r = 8; m = 4; m1 = 2 and N = 10; 000: MatricesA 2 Rr�n; b 2 Rr; c 2 Rn; T 2 Rm�n; q 2 Rm1 ;W 2 Rm1�m; P 2 Rn�n;H 2 Rm�m and h(!i) 2 Rm; �(!i); i = 1; :::; N are randomly selected.Meanwhile a solution of (1.3) with these matrices is generated. The nu-merical results with a random starting point and the convergence criterionjjF (�k)jj � 10�8 are shown in Figure 2.Example 3. Since the error occurs from (1.2) to (1.3) only from numerical16

Table 1: The iteration number k, jjxk � ~x�jj jj�(xk)� �(~x�)jj and jjF (�k)jj.~x� x0 k jjxk � ~x�jj jj�(xk)� �(~x�)jj jjF (�k)jj(3,1) (5,2) 6 3:0042� 10�5 2:5601� 10�6 2:0729� 10�8(3,4) 7 3:0005� 10�5 2:5601� 10�6 1:9755� 10�8(-1,-1) 8 3:0039� 10�5 2:5601� 10�6 1:7920� 10�8(2,-2) 6 3:0222� 10�5 2:5601� 10�6 2:1935� 10�7(3,2) (5,2) 6 4:5650� 10�6 2:8733� 10�6 1:4179� 10�7(3,4) 7 4:8014� 10�6 2:8733� 10�6 1:2249� 10�7(-1,-1) 8 4:3170� 10�6 2:8733� 10�6 4:1988� 10�7(2,-2) 7 4:9945� 10�6 2:8732� 10�6 3:3767� 10�7(1,2) (5,2) 12 3:8905� 10�5 5:1846� 10�6 3:6060� 10�7(3,4) 13 3:9696� 10�5 5:1846� 10�6 4:7172� 10�7(-1,-1) 15 3:9460� 10�5 5:1846� 10�6 2:2323� 10�7(2,-2) 14 3:8933� 10�5 5:1846� 10�6 3:3142� 10�7(2,1) (5,2) 7 5:2190� 10�5 6:5892� 10�8 1:3667� 10�7(3,4) 8 5:1899� 10�5 6:5892� 10�8 1:8472� 10�7(-1,-1) 9 5:2194� 10�5 6:5892� 10�8 1:4085� 10�7(2,-2) 8 5:1994� 10�5 6:5892� 10�8 7:9980� 10�8

17

21. R.J.B. Wets, Stochastic programming: Solution techniques and ap-

proximation schemes, in: Mathematical Programming, The State of

the Art-Bonn 1982, eds. by A. Bachem, M. Gr�otschel and B. Korte

(Springer-Verlag, Berlin, 1983) 566-603.

22. C. Zhu and R.T. Rockafellar, Primal-dual projected gradient algo-

rithms for extended linear-quadratic programming, SIAM J. Optimiza-

tion (to appear).

22

10. A. Prekopa and R.J.-B. Wets, Preface of Stochastic Programming 84,

Mathematical Programming Study 28.

11. L. Qi, Convergence analysis of some algorithms for solving nonsmooth

equations, Mathematics of Operations Research 18(1993) 227-244.

12. L.Qi, Superlinearly convergent approximate Newton methods for LC

1

optimization problems, to appear in Mathematical Programming.

13. L. Qi and R.Womersley, An SQP algorithm for extended linear-quadratic

problems in stochastic programming, Applied Mathematics Preprint

AM92/23, School of Mathematics, the University of New South Wales,

Sydney, Australia.

14. S.M. Robinson, An implicit-function theorem for a class of nonsmooth

equations, Mathematics Operation Research 16(1991) 292-309.

15. R.T. Rockafellar, Convex Analysis, Princeton University Press, Prince-

ton, NJ, 1970.

16. R.T. Rockafellar, Computational schemes for solving large-scale prob-

lems in extended linear-quadratic programming, Mathematical Pro-

gramming 48(1990) 447-474.

17. R.T. Rockafellar and R.J.-B.Wets, A Lagrangian �nite-generation tech-

nique for solving linear-quadratic problems in stochastic programming,

Mathematical Programming Study 28(1986) 63-93.

18. R.T. Rockafellar and R.J.-B. Wets, Linear-quadratic problems with

stochastic penalties: the �nite generation algorithm, in V.I. Arkin, A.

Shiraev and R.J-B. Wets, eds., Stochastic Optimization, Lecture Notes

in Control and Information Sciences 81 (Springer-Verlag, Berlin, 1987)

545-560.

19. I.H. Sloan, Numerical integration in high dimensions-the lattice rule

approach, Numerical Integration (1992) 55-69.

20. P. Tseng, Applications of a splitting algorithm to decomposition in

convex programming and variational inequalities, SIAM J. Control and

Optimization 29(1991) 119-138.

21

jjx

N+20

� x

N

jj

N 200 400 600 800 1000 1200

Monte-Carlo

Lattice

References

1. J.R. Birge and R.J.-B. Wets, Designing approximation schemes for

stochastic optimization problems, in particular, for stochastic programs

with recourse, Mathematical Programming Study 27(1986) 54-102

2. R. Fletcher, Practical Methods of Optimization, Second Edition, John

Wiley & Sons, Ltd, 1987.

3. S. Joe and I.H. Sloan, Imbedded lattice rules for multidimensional in-

tegration, SIAM J. Numerical Analysis 29(1992) 1119-1135.

4. P. Kall, Stochastic programming - an introduction, Sixth International

Conference on Stochastic Programming, Italy, 1992.

5. Y.M. Kaniovski, A.J. King and R.J.-B. Wets, Probabilistic bounds (via

large deviations) for the solutions of stochastic programming problems,

preprint (1993).

6. L. Nazareth and R.J.-B. Wets, Algorithms for stochastic programs:

The case of nonstochastic tenders, Mathematical Programming Study

28(1986) 1-28.

7. L. Nazareth and R.J.-B. Wets, Nonlinear programming techniques ap-

plied to stochastic programs with recourse, in: Numerical Techniques

in Stochastic Programming, eds. by Y. Ermoliev and R. Wets, (Spring-

Verlag, Berlin, 1988) 95-119.

8. H. Niederreiter, Multidimensional numerical integration using pseudo

random numbers, Mathematical Programming Study 27(1986) 17-38.

9. J.S. Pang, Newton's method for B-di�erentiable equations, Mathemat-

ics of Operations Research 15 (1990) 311-341.

20

Let n = 20; r = 4;m = 4;m

1

= 2;m

2

= 2: Matrices A 2 R

r�n

; b 2

R

r

; c 2 R

n

; T 2 R

m�n

; q 2 R

m

1

;W 2 R

m

1

�m

; P 2 R

n�n

and H 2 R

m�m

are

randomly generaly. Let = [�; �]� [�; �] and let � be the normal density

as

�(!) =

1

p

2�jCj

1

2

expf�

1

2

(! � �)

T

C

�1

(! � �)g;

where � 2 R

2

is the mean value and C 2 R

2�2

is the coveriance matrix. Let

C = L

T

L be the Choleski factorization of C, � = jCj and ! = Lv+ �. Then

�(!) =

1

p

2��

2

exp(�

1

2

v

T

v):

We chose C = I; � = 0 and � = ��. Let ! = 2�u�� and u = t�

1

2�

sin2�t.

Then

(y) =

Z

�

��

Z

�

��

g(y; !)�(!)d!

= (2�)

2

Z

1

0

Z

1

0

g(y; �(2u� 1))�(�(2u � 1))du(=:

1

(y))

= (2�)

2

Z

1

0

Z

1

0

g(y; �(2t�

1

�

sin2�t� 1))

�(�(2t �

1

�

sin2�t� 1))(1 � cos2�t

1

)(1� cos2�t

2

)dt(=:

2

(y))

We use simple Monte Carlo method to value

1

(y). Select N seven-decimal

two-dimensional vectors at random from a uniform distribution in [0; 1] �

[0; 1].

We use simple lattice method to value

2

(y). Select N two-dimensional

vectors ffj

(1;2)

N

gg

N�1

j=0

. The braces indicate that each component of the vector

is to be replaced by its fractional part: that is f!g = ! � [!]. [!] denoting

the largest integer which does not exceed.

We comppare the use of the simple Monte-Carlo method and the simple

lattice method in Algorithm 3.2 by testing jjx

N+20

� x

N

jj with di�erent N .

The numerical results are shown in Table 1.

Acknowledgemets

The authors wish to thank S. Joe, T. Langtry and I.H. Sloan for dis-

cussions on multidimensional numerical integration. The authors are also

thankful to A.J. King and R.J.-B. Wets for their preprint.

19

x0=c

0

1

2

3

4

5 x106

0 5 10

Iterations

||fx|

|

10-9

10-6

10-3

100

103

106

0 5 10

Iterations||

x -

x* ||

0

0.5

1

1.5

2

2 4 6 8 10

Iterations

Rat

io ||

F(x

)||

0

2000

4000

6000

8000

10000

2 4 6 8 10

Iterations

||F(x

)||

mean value of the integrand sampled at points chosen from an appropriate

statistical distribution function. The methods are e�ective when integrand

function g(y; !)�(!) is smooth with respect to !. However, the methods

do not converge very fast with the rate of convergence being O(

p

N). The

lattice methods are based on number theory. The methods converge faster

and have sharper error bound than Monte-Carlo methods. However, the in-

tegrand function g(y; !)�(!) is assumed to be 1-periodic in each of its m

2

variables and is assumed to be half-open unit cube [0; 1)

m

2

. Monte-Carlo

methods have been applied to stochastic programming recently. See [5].

However, it seems to that the lattice methods have never been used in this

area. We use a trasformation function ! = q(t) suggested by Sloan to rewrite

� by

�(z) =

Z

1

0

:::

Z

1

0

g(y; q(t))�(q(t))q

0

(t)dt;

where g(y; q(t))�(q(t))q

0

(t) satisfys these assumptions for lattice methods.

18

Zg is unique. Hence p

�

=5f(y

�

) = �

1

N

P

N

i=1

z(!

i

; y

�

) is unique, so �

�

is the

unique solution of (1.4).

Since A has full row rank, the Lagrange multipliers �

�

at (x

�

; y

�

) corre-

sponding to Ax � b is unique and the linear independence condition of (1.3)

holds at the unique Kuhn-Tucker point �

�

= (x

�

; y

�

; �

�

; p

�

)

T

. Furthermore

B-regularity implies that the second-order su�ciency condition of (1.3) is sat-

is�ed at �

�

. Therefore all the conditions of Lemma 3.1 are satis�ed. Hence

we have

lim

k!1

jjF (~�

k

)jj

jjF (~�

k�1

)jj

= 0:

It implies that there is k

0

such that for all k � k

0

, (3.7) holds. Therefore,

the sequence f�

k

g

1

k

0

generated by Algorithm 3.2 is same as it is generated by

using (3:6)

0

. By Lemma 3.1, Algorithm 3.2 converges superlinearly.

2

4. Numerical Experiments

In this section, we randomly generate problems (1.3) and problems (1.2)

with two dimensional integral. The �rst example is given to demonstrate the

e�ciency of Algorithm 3.2 for problem (1.3). The second example is given to

to test Algorithm 3.2 with the use of Monte-Carlo methods and the lattice

methods.

The numerical experiments were obtained by using a DEC 5000 work station.

Example 1. Let n = 20; r = 4; N = 200;m = 4;m

1

= 2: Matrices

A 2 R

r�n

; b 2 R

r

; c 2 R

n

; T 2 R

m�n

; q 2 R

m

1

;W 2 R

m

1

�m

; h(!) 2 R

m

; P 2

R

n�n

and H 2 R

m�m

are randomly generaly. Meanwhile a solution of (1.3)

with these matrices is generated. We used the trust region method to solve

the quadratic programming (3.6) at each step (see [2] for example). The

numerical results with starting point x

0

= c are shown in Figure 1.

Since the error occurs only from numerical integration (jj�(y) � f(y)jj)

and � is de�ned implicitly as the optimal value of an optimization prob-

lem, we consider choosing a numerical integration rule which o�ers savings

in the number of function evaluations and also o�ers the possibility of error

estimation. Monte-Carlo methods and lattice methods are two popular nu-

merical integration rules. Monte-Carlo methods are based on estimating the

17

Theorem 2.2 in [13], (x

�

; y

�

) is unique.

2

Lemma 3.1. Suppose that �

�

= (x

�

; y

�

; �

�

; p

�

) 2 R

n

� R

m

� R

r

� R

m

is a Kuhn-Tucker point of (1.3) and �

�

satis�es the second-order su�ciency

conditions, the strict complementarity condition and the linear independence

condition of (1.3). Then the approximate Newton method

minimize (Px

k

+ c)

T

d

x

+

1

2

d

T

x

(P + �I)d

x

+Q(y

k

)

T

d

y

+

1

2

d

T

y

U(y

k

)d

y

d

x

; d

y

subject to A(x

k

+ d

x

) � b

Td

x

� d

y

= 0: (3:6)

0

x

k+1

= x

k

+ d

x

k

; y

k+1

= y

k

+ d

y

k

:

is well-de�ned and �

k

= (x

k

; y

k

; �

k

; p

k

)

T

superlinearly convergent to �

�

. If

F (�

k

) 6= 0 with �

k

, and �

k

= (x

k

; y

k

; p

k

), then

lim

k!1

jjF (�

k

)jj

jjF (�

k�1

)jj

= 0:

Proof. By Proposition 2.1 and Theorem 2.1, 5f is semismooth. Since the

contraints of (1.3) are linear and

P 0

0 U(y

k

)

!

2 @

B

Px

k

+ c

5f(y

k

)

!

;

all conditions of Theorem 3.1 in [12] hold. By that theorem, Lemma 3.1 holds.

Theorem 3.3. Let � = 0 and � = 0 in Algorithm 3.2. Suppose that the

algorithm does not stop in a certain iteration. Suppose that the problem

(1.3) is B-regular and that A has full row rank. If the strict complementar-

ity condition holds for (1.3) at the unique Kuhn-Tucker point. Then there

is k

0

such that for all k � k

0

, (3.7) holds and the sequence f�

k

: k � k

0

g

converges superlinearly to the unique solution �

�

of (1.4).

Proof. By Theorems 3.1 and 3.2, a sequence f�

k

g generated by the algo-

rithm will converge to a solution �

�

of (1.4). Since (x

�

; y

�

) is unique and H

is symmetric positive de�ne, z(!

i

; y

�

)=argmax

z

f�

1

2

z

T

Hz+(h(!

i

)�y

�

)

z

; z 2

16

If jjF (~�

k

)jj � �, stop.

If k = 0 go to step 3.

If

jjF (~�

k

)jj

jjF (�

k�1

)jj

� �

1

; (3:7)

let x

k+1

= x

k

+ d

x

k

; y

k+1

= y

k

+ d

y

k

, go to iteration k + 1; otherwise, go to

step 3.

3. Let i

k

be the minimum integer i � 0 such that

�(x

k

+ �

i

d

x

k

; y

k

+ �

i

d

y

k

) � �(x

k

; y

k

)�

�

2

�

i

(d

T

x

k

(P + �I)d

x

k

+ d

T

y

k

U(y

k

)d

y

k

):

let x

k+1

= x

k

+ �

i

k

d

x

k

; y

k+1

= y

k

+ �

i

k

d

y

k

:

Theorem 3.2. Let � = 0 in Algorithm 3.2. If Algorithm 3.2 stops at kth it-

eration, then (~x

k

; ~y

k

) is an optimal solution of the problem (1.3). Otherwise,

if step 3 is used for all large k, then two in�nite sequences fx

k

g and fy

k

g are

generated by the algorithm, and the conclusions of Theorem 3.1 hold. Oth-

erwise, two in�nite sequences fx

k

; k 2 K

0

g and fy

k

: k 2 K

0

g are generated

by the algorithm, where K

0

= fk : step 3 is not applied at iteration kg: If x

�

is an accumulation point of fx

k

; k 2 K

0

g, then y

�

= Tx

�

is an accumulation

point of fy

k

; k 2 K

0

g and (x

�

; y

�

) is a solution of (1.3). If the problem (1.3)

is B-regular, then (x

�

; y

�

) is unique.

Proof. By the same argument as in the proof of Theorem 3.1, if the algo-

rithm stops at the kth iteration, (~x

k

; ~y

k

) is an optimal solution of (1.3). If

step 3 is used for all large k, then we have the same situation as Theorem

3.1. Therefore we assume that the algorithm does not stop at step 1 and

that step 3 is not always used for large k. Then a sequence f�

k

; k 2 K

0

g

is generated by the algorithm, where K

0

is de�ned in the statement of the

theorem. Assume that �

k

is an accumulation point of f�

k

; k 2 K

0

g. Let K

be a subsequence of K

0

such that f�

k

; k 2 Kg conveges to �

�

. By (3.7), we

have

jjF (�

�

)jj = lim

k2K

jjF (�

k

)jj = 0:

Hence Kuhn-Tucker conditions of (1.3) hold at (x

�

; y

�

). By of Theorem 9.4.2

in [2], (x

�

; y

�

) is a solution of (1.3) (cf. the proof of Theorem 3.1). By

15

To consider the superlinear convergence rate, we need the de�nition of

B-regularity [13]. Let v

�

= (x

�

; y

�

)

T

be an optimal solution of (1.3) and let

C

�

=

P 0

0 U(y

�

)

!

:

Let v = (x; y)

T

and

E(v) =

Px+ c

5f(y)

!

Let Y = fy j y = Tx; x 2 Xg and

X

�

� Y

�

= fx 2 X; y 2 Y : E(v

�

)

T

(v � v

�

) = 0g:

Then X

�

�Y

�

is called the critical face of X�Y in (1.3) and is independent

of the particular choice of x

�

[16]. The problem (1.3) is B-regular if

(v � v

�

)

T

C

�

(v � v

�

) > 0; 8 x 2 X

�

n fx

�

g;8 y 2 Y

�

n fy

�

g;

Clearly, if P + T

T

U(y

�

)T is positive de�nite, then the problem (1.3) is B-

regular. In particular if P is positive de�nite, so the problem (1.3) is fully

quadratic, then it is B-regular.

Using the technique established in [12, 13], we develop a two-stage algo-

rithm for solving problem (1.3). The �rst stage algorithm is globally con-

vergent. The second stage algorithm is superlinearly convergent under the

B-regularity condition and other conditions.

Algorithm 3.2. Choose �; � 2 (0; 1); � > 0; � > 0; 1 > �

1

> 0; �

0

2 D

satisfying y

0

= Tx

0

. For k � 0:

1. Solve a quadratic program:

minimize (Px

k

+ c)

T

d

x

+

1

2

d

T

x

(P + �I)d

x

+Q(y

k

)

T

d

y

+

1

2

d

T

y

U(y

k

)d

y

d

x

; d

y

subject to A(x

k

+ d

x

) � b

Td

x

� d

y

= 0: (3:6)

2. Let �

k

and p

k

be the Lagrange multipliers at the solution of (3.6) corre-

sponding A(x

k

+d

x

) � b and Td

x

= d

y

, respectively. Let ~�

k

= (~x

k

; ~y

k

; p

k

)

T

=

(x

k

+ d

x

k

; y

k

+ d

y

k

; p

k

)

T

: Calculate F (~�

k

) with �

k

(cf. (2.2)).

14

Algorithm 3.1. Choose �; � 2 (0; 1); � > 0; � > 0; �

0

2 D satisfying

y

0

= Tx

0

. For k � 0:

1. Solve a quadratic program:

minimize (Px

k

+ c)

T

d

x

+

1

2

d

T

x

(P + �I)d

x

+Q(y

k

)

T

d

y

+

1

2

d

T

y

U(y

k

)d

y

d

x

; d

y

subject to A(x

k

+ d

x

) � b

Td

x

� d

y

= 0: (3:5)

2. Let �

k

and p

k

be the Lagrange multipliers at the solution of (3.5) corre-

sponding A(x

k

+d

x

) � b and Td

x

= d

y

; respectively. Let ~�

k

= (~x

k

; ~y

k

; p

k

)

T

=

(x

k

+ d

x

k

; y

k

+ d

y

k

; p

k

)

T

: Calculate F (~�

k

) with �

k

(cf. (2.2)). If jjF (~�

k

)jj � �,

stop; otherwise go to step 3.

3. Let i

k

be the minimum integer i � 0 such that

�(x

k

+ �

i

d

x

k

; y

k

+ �

i

d

y

k

) � �(x

k

; y

k

)�

�

2

�

i

(d

T

x

k

(P + �I)d

x

k

+ d

T

y

k

U(y

k

)d

y

k

)

and let x

k+1

= x

k

+ �

i

k

d

x

k

; y

k+1

= y

k

+ �

i

k

d

y

k

:

Theorem 3.1. Let � = 0 in Algorithm 3.1. If Algorithm 3.1 stops at the

k-th iteration, then (~x

k

; ~y

k

) is an optimal solution of the problem (1.3). Oth-

erwise, two sequences fx

k

g and fy

k

g are generated by the algorithm. If x

�

is an accumulation point of fx

k

g, then y

�

= Tx

�

is an accumulation point of

fy

k

g and (x

�

; y

�

) is an optimal solution of problem (1.3).

Proof. If the algorithm stops at the kth iteration, then 0 = F (~�

k

) 2 L(~�

k

).

Therefore the Kuhn-Tucker conditions for (1.3) hold at (~x

k

; ~y

k

). By Propo-

sition 2.1 and Theorem 2.1, the objective function

1

2

x

T

Px+ c+ f(y) is con-

tinuously di�erentiable in R

n

� R

m

. Hence (1.3) is a convex programming

in which the objective and the constraints are C

1

functions. By Theorem

9.4.2 in [2], (~x

k

; ~y

k

) is an optimal solution of problem (1.3). If the algorithm

does not stop in a certain iteration, two in�nite sequences are generated. It

is a standard proof to show that an accumulation point (x

�

; y

�

) is an optimal

solution of (1.3) (see [13]).

2

13

In this section, we �rst consider how to implement the Newton-likemethod

(1.6) for problem (1.3) and give a practical algorithm. Next, we give a global

convergence theorem. Finally, we show the convergence rate of the algorithm

is superlinear.

Newton-like method (1.6) can be rewritten as

(

solve V

k

d+ F (�

k

) = 0 to get d

k

; (3:1)

let �

k+1

= �

k

+ d

k

:

Since V

k

is symmetric positive semi-de�nite, (3.1) is equivalent to

(

d

k

= argmix

d

fF (�

k

)

T

d +

1

2

d

T

V

k

dg

�

k+1

= �

k

+ d

k

: (3:2)

Let d = (d

x

; d

y

; d

p

). Consider (3.2) in

�

D, we have

8

>

<

>

:

d

k

= argmin

d

f(Px

k

+ c)

T

d

x

+

1

2

d

T

x

Pd

x

+Q(y

k

)

T

d

y

+

1

2

d

T

y

U(y

k

)d

y

+(p

k

+ 2d

p

)

T

(Td

x

� d

y

) + d

T

p

(Tx

k

� y

k

) : x

k

+ d

x

2 Xg

�

k+1

= �

k

+ d

k

: (3:3)

If we take an initial point �

0

2 D satisfying Tx

0

= y

0

and demand

Td

x

= d

y

at each step in (3.3), then we have

minimize (Px

k

+ c)

T

d

x

+

1

2

d

T

x

Pd

x

+Q(y

k

)

T

d

y

+

1

2

d

T

y

U(y

k

)d

y

d

x

; d

y

subject to A(x

k

+ d

x

) � b

Td

x

� d

y

= 0:

x

k+1

= x

k

+ d

x

k

; y

k+1

= y

k

+ d

y

k

: (3:4)

Since P is positive semi-de�nite, the singularity happens when we run

(3.4). To ensure that the algorithm is well-de�ned, we modify (3.4) by using

P + �I where � is a positive number in the generalized Jacobian V

k

. To

have global convergence, we add a line search to (3.4). Denote the objective

function in (1.3) by �(x; y), i.e.

�(x; y) =

1

2

x

T

Px+ c

T

x+ f(y):

12

Obviously, 0 2 S(u). Let d = �

S

(u) � u. Then d 6= 0 and s

T

d = 0 for any

s 2 S(u). Hence �

0

S

(u; d) = �

S(u)

d = 0:

Since �

S

is piecewise smooth, �

S

is semismooth. By Lemma 2.1 in [11],

there exists a U 2 @

B

�

S

(u) such that �

0

S

(s; d) = Ud = 0. This implies that

U is singular and H

�

1

2

UH

�

1

2

2 @

B

Q

i

(y) is singular.

2

Proposition 2.3. For any � 2

�

D, there is a unique �x = A

T

� 2 R

n

such

that

F (�) =

0

B

@

Px+ c+ T

T

p +A

T

�

5f(y)� p

Tx� y

1

C

A

; (2:2)

where � � 0; �

T

(Ax� b) = 0. Furthermore, if A has full row rank, then such

� in (2.2) is unique.

Proof. By the de�nition, F (�) = mL(�) and

jjmL(�)jj = argminfjj� jj j � 2 L(�)g:

Let � = �(Px+ c+ T

T

p): Since X is a nonempty polyhedra, @�

X

(x); x 2 X

is a nonempty closed and covex set, so L(�) is. By separation theorems on

convex sets [15], there is a unique F (�) = mL(�) 2 L(�) with smallest norm

and

jjF (�)jj

2

= minimize jj�� jj

2

+ jj 5 f(y)� pjj

2

+ jjTx� yjj

2

� 2 @�

X

(x)

= jj�

@�

X

(x)

(�)� �jj

2

+ jj 5 f(y)� pjj

2

+ jjTx� yjj

2

= jjA

T

�� jj

2

+ jj 5 f(y)� pjj

2

+ jjTx� yjj

2

:

Therefore (2.2) holds. If A has full row rank, then � is the unique solution

of A

T

� = �

@�

X

(x)

(�) = �x.

2

3. Algorithm and Convergence

11

where �

k

> 0. Then u

k

=2 S and for any u

k

, there is a neighborhood N

u

k

of

u

k

such that

�

B

u

k

=

�

B

u

k

+h

for any u

k

+h 2 N

u

k

. By the discussion above, �

S

is di�erentiable at u

k

and 5�

S

(u

k

) = �

L(u

k

)

. Letting �

k

! 0 and u

k

! u,

then

lim

u

k

!u

�

L(u

k

)

= �

L(u)

2 @

B

�

S

(u):

Hence H

�

1

2

�

L(u)

H

�

1

2

2 @

B

Q

i

(�y):

Since 5f = Q is globally Lipschitz in R

m

, F is globally Lipschitz in D.

Moreover, (2.1) holds.

2

Remark 2.1. Let

~

B

u

=

�

B

u

�

S

(u)� u

!

:

Then �

L(u)

is a projection from R

m

to the null space N (

~

B

u

) of

~

B

u

: We can

give �

L(u)

by QR decomposition. Let

~

B

u

be an l�mmatrix with rank(

~

B

u

) =

r > 0. Let

~

B

T

u

= QR be a QR decomposition of

~

B

T

u

, where Q = [Q

1

; Q

2

]

is an orthogonal matrix of order m � m (Q

1

2 R

m�r

; Q

2

2 R

m�(m�r)

) and

R = [

�

R; 0] is an upper triangular matrix of order m� l (

�

R 2 R

r�l

). Then we

have

�

L(u)

= Q

2

Q

T

2

:

We can also give the projection by singular value decomposition.

De�nition 2.2. [11]. Let : R

m

! R

m

be a locally Lipschitzian mapping.

We say that is semismooth at y if

lim

U2@

B

(y+td

0

)

d

0

!d;t#0

fUd

0

g

exists for any d 2 R

m

.

Proposition 2.2. IfH

�1

(h(!

i

)�y) =2 Z, then at least one matrix in @

B

Q

i

(y)

is singular.

Proof. By Lemma 2.1, Q

i

is everywhere directionally di�erentiable along

any direction and �

0

S

(u; d) = �

S(u)

d. Let u = H

�

1

2

(h(!

i

) � y): Then u =2 S:

10

Since H

1

2

�z = �s = �

S

(u) = �

S

(H

�

1

2

v) where v = h(!

i

)� �y, we have

Q

i

(�y) = ��z = �H

�

1

2

�

S

(H

�

1

2

(h(!

i

)� �y)):

Furthermore, if u 2 D

�

S

; then Q

i

is di�erentiable at �y and

5Q

i

(�y) = H

�

1

2

5�

S

(H

�

1

2

(h(!

i

)� �y))H

�

1

2

= H

�

1

2

�

L(u)

H

�

1

2

:

Hence (i)-(iii) follow from Lemma 2.1.

iv) If �z = H

�1

v 2 int Z, then u = H

�

1

2

v 2 int S and

�

B

u

� 0. Hence

Q

i

is di�erentiable at �y and L(u) = R

m

: This implies 5�

S

(u) = I and

5Q

i

(�y) = H

�1

.

If �z = H

�1

v is on the boundary of Z, then H

�1

2 @

B

Q

i

(�y) by passing to

a subsequence fy

k

: H

�1

(h(!

i

)� y

k

) 2int Zg:

Now we consider the case H

�1

v =2 Z. Since u = H

�

1

2

v =2 S, there exists

a neighborhood N

u

of u such that for any u+ h with jjhjj su�ciently small,

u+ h =2 S.

If there exists a neighborhood N

u

of u such that for any u + h 2 N

u

�

B

u

=

�

B

u+h

, then S(u) = S(u+ h). By Lemma 2.1, we have

�

S

(u+ h) = �

S

(u) + �

S(u)

(h)

and

�

S

(u) = �

S

(u+ h) + �

S(u+h)

(�h):

Hence

�

S(u)

(h) = ��

S(u+h)

(�h) = ��

S(u)

(�h):

Write s = �

S(u)

(h). Then both s and �s belong to S(u). Thus, we must

have

�

B

u

s = 0. Hence

�

B

u

�

S(u)

� 0. By Lemma 2.1, �

S

is di�erentiable

at u, so that Q

i

is di�erentiable at �y and 5Q

i

(�y) = H

�

1

2

5 �

S

(u)H

�

1

2

=

H

�

1

2

�

L(u)

H

�

1

2

.

If there is not a neighborhood of u such that

�

B

u

=

�

B

u+h

, then there is

an index set J

0

= fi

1

; i

2

; :::; i

j

0

g with j

0

� 1 such that (Bu)

i

j

= q

i

j

; j =

1; 2; :::; j

0

: Take a sequence fu

k

g such that

(Bu

k

)

i

=

(

(Bu)

i

if i =2 J

0

(Bu)

i

+ �

k

if i 2 J

0

,

9

where u = H

�

1

2

(h(!

i

)� y);

iii)

@

B

Q

i

(y) = H

�

1

2

lim

u

k

!u

u

k

2D

�

S

f5�

S

(u

k

)gH

�

1

2

= H

�

1

2

lim

u

k

!u

u

k

2D

�

S

f�

L(u

k

)

gH

�

1

2

;

where u

k

= H

�

1

2

(h(!

i

)� y

k

):

iv) Let

U

i

(y) =

(

H

�1

if H

�1

(h(!

i

)� y) 2 Z

H

�

1

2

�

L(u)

H

�

1

2

otherwise.

Then U

i

(y) 2 @

B

Q

i

(y). Furthermore, in D

F (�) =

0

B

@

Px+ c+ T

T

p

5f(y)� p

Tx� y

1

C

A

is globally Lipschitz and

V

�

=

0

B

@

P 0 T

T

0 U(y) �I

T �I 0

1

C

A

2

0

B

@

P 0 T

T

0 @

B

Q(y) �I

T �I 0

1

C

A

= @

B

F (�); (2:1)

where U(y) =

1

N

P

N

i=1

U

i

(y) and @

B

Q(y) =

1

N

P

N

i=1

@

B

Q

i

(y):

Proof. Let �y be a vector in R

m

and let v = h(!

i

) � �y. Then �z =argmax

z

f�

1

2

z

T

Hz + z

T

v; z 2 Zg if and only if for any z 2 Z,

(z � �z; v �H�z) � 0:

Let s = H

1

2

z: Then �s = H

1

2

�z if and only if for any s 2 S,

(s� �s;H

�

1

2

v � �s) � 0:

Let u = H

�

1

2

v. By the De�nition 2.1, �s is the projection of u onto S, i.e.

�s = �

S

(u).

Let

�

B

u

denote the submatrix of B comprising of rows that correspond to

the active constraints of the inequalities Bs � q at the vector �

S

(u).

8

where B 2 R

m

1

�m

.

For any arbitrary vector u let

�

B

u

denote the submatrix of B comprising

of rows that correspond to the active constraints of inequalities Bs � q at

the vector �

S

(u). De�ne the polyhedral cone

S(u) = fs : (�

S

(u)� u)

T

s = 0;

�

B

u

s � 0g

and the lineality space of S(u)

L(u) = S(u) \ (�S(u)):

We summarize properties of the projection function which are used in

this paper.

Lemma 2.1.

(i) [14] �

S

is a contraction, i.e., it is Lipschitz with modulus 1.

(ii) [9] �

S

is everywhere directionally di�erentiable along any direction and

�

0

S

(u; d) = �

S(u)

(d): Furthermore, for any vector h with jjhjj su�ciently

small

�

S

(u+ h) = �

S

(u) + �

S(u)

(h):

(iii) [9] �

S

is F -di�erentiable at a vector u if and only if

�

B

u

�

S(u)

is identi-

cally equal to zero; in this case, 5�

S

(u) = �

L(u)

; (here

�

B

u

�

S(u)

denotes the

composite map: (

�

B

u

�

S(u)

)(d) =

�

B

u

(�

S(u)

(d))):

By Lemma 2.1 and Rademacher's theorem, �

S

is di�erentiable almost

everywhere. Let D

�

S

be the set where �

S

is di�erentiable.

Theorem 2.1. Let B = WH

�

1

2

. Then

S = fs j Bs � q; s 2 R

m

g = fs j s = H

1

2

z; z 2 Zg

and

i) Q

i

is di�erentiable almost everywhere;

ii) Q

i

is F -di�erentiable at y if and only if

�

B

u

�

S(u)

� 0;

7

In this section, we show that F is globally Lipschitz in D and give the

generalized Jacobian @

B

F (�) for � 2 D.

Proposition 2.1. Let !

i

2 R

m

2

be �xed, i = 1; 2; :::; N ,

Q

i

(y) = �arg max

z

f�

1

2

z

T

Hz + (h(!

i

)� y)

T

z; z 2 Zg

and

Q(y) =

1

N

N

X

i=1

Q

i

(y):

Then Q(y) = 5f(y).

Proof. Since H is positive de�nite, for any y 2 R

m

; !

i

2 R

m

2

there exists a

unique z(y; !

i

) such that z(y; !

i

) =argmax

z

f�

1

2

z

T

Hz+(h(!

i

)�y)

T

z; z 2 Zg.

By Theorem 27.1 in [15] on convex conjugate functions,

g(�; !

i

) = max

z

f�

1

2

z

T

Hz + (h(!

i

)� �)

T

z; z 2 Zg

is di�erentiable at y and

@g

@y

(y; !

i

) = �z(y; !

i

):

Hence we have

5f(y) =

1

N

N

X

i=1

@g

@y

(y; !

i

) = �

1

N

N

X

i=1

z(y; !

i

) =

1

N

N

X

i=1

Q

i

(y) = Q(y):

2

De�nition 2.1. Let S be a nonempty, closed and convex subset of R

m

:

Then the projection of a point u 2 R

m

onto the set S, denoted as �

S

(u)

is de�ned as the solution (which must exist and be unique) to the following

mathematical program:

min

s2S

jju� sjj:

Let S be the polyhedron

S = fs : Bs � q; s 2 R

m

g;

6

consider the nonsmooth equations

F (�) = 0; � 2

�

D: (1:4)

If �

�

is a solution of (1.4), then the Kuhn-Tucker conditions for (1.3) hold

at (x

�

; y

�

). Since (1.3) is a convex programming and the objective and the

constraints of (1.3) are C

1

functions, by Theorem 9.4.2 in [2], (x

�

; y

�

) is a

global solution to (1.3).

In Section 2, we show that F is globally Lipschitz in D. By Rademacher's

theorem, F is di�erentiable almost everywhere inD. LetD

F

be the set where

F is di�erentiable. We use the generalized Jacobian de�ned in [11]

@

B

F (�) = f lim

�

k

!�

�

k

2D

F

5F (�

k

)g: (1:5)

We consider the generalized Newton's method

�

k+1

= �

k

� V

�1

k

F (�

k

); (1:6)

where �

0

2 D, V

k

2 @

B

F (�̂

k

) and

�̂

k

=

(

�

k

if �

k

2 D;

(x

0

; y

k

; p

k

)

T

otherwise:

In Section 2, we give the generalized Jacobian @

B

F (�) for � 2 D by using

Pang's results on the projection function [9].

In Section 3, we consider how to implement the method (1.6) and give

a practical algorithm. We present a global convergence theorem for the

algorithm and show that the convergence rate of the algorithm is superlinear.

When we get a solution of (1.3), two basic questions are raised: how good

of an estimate is the optimal value of (1.3) for the optimal value of (1.2) and

how well do the solutions of (1.3) approximate the solutions of (1.2) [5]. In

Section 4, while solving (1.3) by our algorithm, we consider which integration

rule can o�er a sharper error bound for jj�(y)� f(y)jj and save the number

of function evaluations. We give numerical experiments to demonstrate the

e�ciency of our algorithm and to compare the use of Monte-Carlo methods

and lattice methods in the algorithm.

2. Generalized Jacobian

5

ELQP [13,16-18,21, 22]. However most of them are less e�cient in the general

case. Even when the problems are fully quadratic, only linear convergence

rates were established. Recently, Qi and Womersely [13] presented a two-

stage sequential quadratic programming method for box-diagonal case and

showed that the rate of convergence of their method is superlinear.

These methods given in [13, 16-18, 22] used the di�erence between the

values of objective of (1.3) and its dual problem as a convergence criterion.

However it is too expensive to calculate both the primal and dual values at

each step for a general problem. When Z is a box the problem correponds

to the simple recourse problem, which is relatively easy. The general case is

typically very hard [4,10].

In this paper, we present a new method. The method is e�cient in gen-

eral case and realizes both global convergence and superlinear convergence.

Furthermore, we do not need to calculate the dual problem.

Throughout this paper, we use the Euclidean norm.

Assume that there exists an optimal solution (x

�

; y

�

) of (1.3). According

to Theorem 28.2 in [15], there exists an optimal Lagrange multiplier vector

p

�

2 R

m

associated with the constraint Tx� y = 0. From the Kuhn-Tucker

conditions for (1.3), we then have

0 2 Px

�

+ c+ @�

X

(x

�

) + T

T

p

�

0 = 5f(y

�

)� p

�

0 = Tx

�

� y

�

;

where �

X

is the indicator function of X. Since X is a convex set, �

X

is

an extended-valued convex function. By convex analysis [15], @�

X

(x) is the

normal cone to X at x, which is de�ned by

@�

X

(x) =

(

fA

T

� j � � 0; �

T

(Ax� b) = 0g if x 2 X;

; if x =2 X:

Let � = (x; y; p) and let L : R

2m+n

! R

2m+n

be the multifunction given by

L(�) =

0

B

@

Px+ c+ @�

X

(x) + T

T

p

5f(y)� p

Tx� y

1

C

A

:

Let m(L(�)) denote the element of L(�) with the smallest norm and let

F (�) = m(L(�)). Let D =intX � R

m

� R

m

and

�

D = X � R

m

� R

m

. We

4

computational di�culties occur primarily in evaluation of � (m variables)

and usually m� n. See [6, 7].

Since it is impossible to demand the exact evaluation of the function � and

its gradient, we consider optimal solution obtained by solving approximate

problems of the form:

minimize

1

2

x

T

Px+ c

T

x+ f(y)

x 2 R

n

; y 2 R

m

subject to Ax � b (1:3)

Tx� y = 0

where

f(y) =

N

X

i=1

�

i

g(y; !

i

);

g(y; !

i

) = maximize �

1

2

z

T

Hz + z

T

(h(!

i

)� y)

z 2 R

m

subject to Wz � q

and weights f�

i

g

N

i=1

and points f!

i

g

N

i=1

are generated by a multidimensional

numerical integration rule. We shall discuss in Section 4 how to approach

� by f , a numerical integration formula. We will use both Monte-Carlo

methods and lattice methods. In the two methods, the same weights are

chosen, that is, �

i

=

1

N

; i = 1; 2; :::; N . The lattice methods have not been

used in stochastic programming before.

Let

X = fx j Ax � b; x 2 R

n

g

and

Z = fz j Wz � q; z 2 R

m

g

be nonempty polyhedra. Assume that the interior of X is nonempty.

Since H is symmetric positive de�nite, f is a di�erentiable convex func-

tion de�ned in the whole space R

m

.

Problem (1.3) can be considered as the Extended Linear Quadratic Pro-

gramming (ELQP) model which was introduced by Rockafellar and Wets

[17, 18]. If both P and H are positive de�nite, the problem is called fully

quadratic. If both P and H are diagonal, and both X and Z are box regions

de�ned by simple lower and upper bounds on the variables, the problem is

called box-diagonal. Several numerical methods were developed for solving

3

1. Introduction

Let P 2 R

n�n

be symmetric positive semi-de�nite and H 2 R

m�m

be

symmetric positive de�nite. We consider the two-stage quadratic stochastic

programs with �xed recourse [17,18]:

minimize

1

2

x

T

Px+ c

T

x+ �(x)

x 2 R

n


where

�(x) =

Z

(x; !)�(!)d!

(x; !) = maximize �

1

2

z

T

Hz + z

T

(h(!)� Tx)

z 2 R

m

subject to Wz � q

c 2 R

n

; A 2 R

r�n

; b 2 R

r

; T 2 R

m�n

; q 2 R

m

1

and W 2 R

m

1

�m

are �xed ma-

trices, ! 2 R

m

2

is a random vector with support � R

m

2

, � is a probability

distribution function on R

m

2

and h(�) 2 R

m

is a random vector.

By introducing a new variable y, we obtain an equivalent form of (1.1) as

follows:

minimize

1

2

x

T

Px+ c

T

x+ �(y)

x 2 R

n

; y 2 R

m


Tx� y = 0;

where

�(y) =

Z

g(y; !)�(!)d!

g(y; !) = maximize �

1

2

z

T

Hz + z

T

(h(!)� y)

z 2 R

m

subject to Wz � q:

The function � is convex and smooth, since H is symmetric positive def-

inite. However � involves multi-dimensional integrals and quadratic pro-

grams. Problem (1.2) is useful because it is a convex program in which

2

Newton's Method for Quadratic Stochastic Programs

with Recourse via Nonsmooth Equations

1

Xiaojun Chen, Liqun Qi and Robert S. Womersley

School of Mathematics

University of New South Wales

P.O. Box 1, Kensington NSW 2033, Australia

(June 1993)

Abstract. Quadratic stochastic programs (QSP) with recourse can be for-

mulated as nonlinear convex programming problems. By attaching a La-

grange multiplier vector to the nonlinear convex program, we rewrite the

problem as a system of nonsmooth equations. We consider a Newton-like

method for solving the system and establish global convergence and local

superlinear convergence of the method. Several methods for special types of

QSP were developed. The new method is applicable for general QSP. Nu-

merical experiments are given to demonstrate the e�ciency of the algorithm

and to compare the use of Monte-Carlo rules and lattice rules for multiple

integration in the algorithm.

Keywords: Newton's method, quadratic stochastic programs, nonsmooth

equations.

Short title: Newton's method for stochastic programs

1

This work is supported by the Australian Research Council.

1

Figure 1:integration (jj�(y)�f(y)jj) and � is de�ned implicitly as the optimal value ofan optimization problem, we consider choosing a numerical integration rulewhich o�ers savings in the number of function evaluations and also o�ers thepossibility of error estimation. Monte-Carlo methods and lattice methods aretwo popular numerical integration rules. Monte-Carlo methods are based onestimating the mean value of the integrand sampled at points chosen from anappropriate statistical distribution function. The methods are e�ective whenthe integrand function g(y; !)�(!) is smooth with respect to !. However,the methods do not converge very fast with the rate of convergence beingO(N� 12 ). The lattice methods are based on number theory. The methodsconverge faster and have sharper error bound than Monte-Carlo methods.However, the integrand function is assumed to be 1-periodic in each of itsm2 variables and the integration region is understood to be the unit cube[0; 1)m2. Monte-Carlo methods have been applied to stochastic programming18

https://www.researchgate.net/publication/220589211_Superlinearly_convergent_approximate_Newton_methods_for_LC1_optimization_problems?el=1_x_8&enrichId=rgreq-f37cc118740e3d690bc291f0230f48f2-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQ4Mjk1MDtBUzoyMTkyNjQ3NzE0NjUyMTZAMTQyOTI4ODIwMzAyNw==

recently. See [6]. However, it seems that lattice methods have not been usedin this area before. We use a transformation function ! = q(t) suggested bySloan to rewrite � by�(z) = Z 10 ::: Z 10 g(y; q(t))�(q(t))q0(t)dt;where g(y; q(t))�(q(t))q0(t) is 1-periodic in each of its variables.Let = Rm2 and let � be the normal density as�(!) = 1(2�)m2=2jCj 12 expf�12(! � �)TC�1(! � �)g;where � 2 Rm2 is the mean value and C 2 Rm2�m2 is the covariance matrix.Let C = LTL be the Choleski factorization of C, � = jCj 12 and ! = L� + �.Then �(!) = 1(2�)m2=2� exp(�12�T�):Without loss of generality, we chose standard normal density, C = I and� = 0. Let ! = tanv (3:1)v = �u� �2 (3:2)and u = t� 12� sin2�t: (3:3)The sequence of transformations (3.1)-(3.3) is used to go from an integral onRm2 to the integral of a 1-periodic function on the unit cube [0; 1)m2 . Noteeach transformation is applied to each element of the vector arguments. Theuse of a lattice rule requires the integrand function to be 1-periodic and this isachieved by (3.2). A simple Monte Carlo method does not. Transformations(3.1)-(3.3) are used as follows:(y) = ZRm2 g(y; !)�(!)d!= Z �2��2 � � � Z �2��2 g(y; tanv)�(tanv) � 1cos2v1:::cos2vm2 dv19

= �m2 Z 10 � � � Z 10 g(y; tan(�u� �2 ))�(tan(�u� �2 ))� 1cos2(�u1 � �2 ):::cos2(�um2 � �2 )du (3:4)= �m2 Z 10 � � � Z 10 g(y; tan(�(t� 12� sin2�t)� �2 ))�(tan(�(t� 12� sin1�t)� �2 ))� (1� cos2�t1):::(1� cos2�tm2)cos2(�(t1 � 12� sin2�t1)� �2 ):::cos2(�(tm2 � 12� sin2�tm2)� �2 )dt: (3:5)A simple Monte-Carlo method [2] is used to approximate (3.4), by select-ing N points uniformly distributed in [0; 1]m2 for use in (1.3).A lattice rule [4, 21] is used to approximate (3.5) by12m2 1Xkm2=0 � � � 1Xk1=0 �1Xj=0 2(y; fj z2 + (k1; :::; km2)2 g);where is an odd number. The braces indicate that each component of thevector is to be replaced by its fractional part: that is f!g = ! � [!], where[!] denoting the largest integer which does not exceed. A \good" latticerule depends a good choice of vector z. Very recently Joe and Sloan (privatecommunication) have proposed a table recommended choices of z. For theuse in this paper, we quote a part of the table. See Appendix A.Let n = 20; r = 8; m = 4; m1 = 2: Matrices A 2 Rr�n; b 2 Rr; c 2Rn; T 2 Rm�n; q 2 Rm1 ;W 2 Rm1�m; P 2 Rn�n (with rank(P ) = n � 1)and H 2 Rm�m are randomly selected. We consider the problem (1.2)with integral dimension m2 = 2 and m2 = 3; respectively. We chooseh(!) = (!1; !2; 12:85; 12:85) for m2 = 2 and h(!) = (!1; !2; !3; 12:85) form2 = 3:We use the same data to compare the use of the simple Monte-Carlomethod and the lattice method in Algorithm 3.1. We test jj~x��xNk jj, jj�(~x�)��(xNk )jj, jjF (�k)jj, computational time and iterations with di�erent N , wherexNk is an approximate solution of (1.3) obtained by Algorithm 3.1, k is the �rstiteration which satis�es the convergence criterion, and ~x� is an approximatesolution of (1.2) obtained by the lattice method with N = 40028 (form2 = 2)and N = 40024 (for m2 = 3). The numerical results with convergencecriterion jjF (�k)jj � 10�7 are shown in Table 2 and Table 3.20

Table 2: jj~x� � xNk jj and jj�(~x�)� �(xNk )jjm2 = 2 Monte-Carlo method Lattice methodN jj~x� � xNk jj jj�(~x�)� �(xNk )jj jj~x� � xNk jj jj�(~x�)� �(xNk )jj4996 4:3317� 10�2 7:6913� 10�1 5:2780� 10�11 3:9790� 10�1310012 2:8168� 10�2 8:2472� 10�1 4:5340� 10�11 7:3896� 10�1320012 1:8876� 10�2 3:3786� 10�1 4:2611� 10�11 2:5011� 10�1240028 2:8166� 10�2 4:7179� 10�1 4:8320� 10�11 8:2423� 10�13m2 = 3 Monte-Carlo method Lattice methodN jj~x� � xNk jj jj�(~x�)� �(xNk )jj jj~x� � xNk jj jj�(~x�)� �(xNk )jj4952 3:0011� 10�2 1.0059 2:2733� 10�9 6:8212� 10�139992 1:1469� 10�2 4:2687� 10�1 4:2307� 10�5 1:3511� 10�320024 8:8562� 10�3 1:7231� 10�1 4:1973� 10�5 1:3445� 10�340024 9:1314� 10�3 3:5135� 10�1 4:1938� 10�5 1:3434� 10�3Table 3: Iterations k, jjF (xk)jj and computational timem2 = 2 Monte-Carlo method Lattice methodN k(jjF (�Nk )jj) Time k(jjF (�Nk )jj) Time4996 6(4:4466� 10�8) 4:5356� 102 6(2:2948� 10�9) 6:4168� 10210012 6(4:5627� 10�10) 9:3588� 102 6(1:9620� 10�9) 1:5057� 10320012 6(9:6096� 10�9) 1:8136� 103 6(1:8602� 10�9) 2:5447� 10340028 6(1:2797� 10�8 ) 3:5858� 103 6(2:1092� 10�9) 5:2971� 103m2 = 3 Monte-Carlo method Lattice methodN k(jjF (�Nk )jj) Time k(jjF (�Nk )jj Time4952 7(8:4431� 10�8) 6:7217� 102 7(6:7147� 10�8) 7:6836� 1029992 7(8:1951� 10�8) 1:2207� 103 7(5:5788� 10�8) 1:5347� 10320024 7(7:2925� 10�8) 2:5222� 103 7(6:3611� 10�8 ) 3:0608� 10340024 7(5:6547� 10�8) 4:8433� 103 7(6:2545� 10�8) 6:3041� 10321

AcknowledgemetsThe authors acknowledge discussions with J.S. Pang and D. Ralph on theprojection function and the algorithm. The authors wish to thank S. Joe,T. Langtry and I.H. Sloan for discussions on multidimensional numerical in-tegration. The authors are also thankful to A.J. King and R.J.-B. Wets fortheir preprint.

22

References1. J.R. Birge and R.J.-B. Wets, \Designing approximation schemes forstochastic optimization problems, in particular, for stochastic programswith recourse," Mathematical Programming Study 27(1986) 54-102.2. P. Davis and P. Rabinowitz, \Methods of Numerical Integration" (Aca-demic Press, New York, 1984).3. R. Fletcher, \Practical Methods of Optimization" (Second Edition, JohnWiley & Sons, Ltd, 1987).4. S. Joe and I.H. Sloan, \Imbedded lattice rules for multidimensionalintegration," SIAM J. Numerical Analysis 29(1992) 1119-1135.5. P. Kall, \Stochastic programming - An introduction," in: the SixthInternational Conference on Stochastic Programming, Italy, 1992.6. Y.M. Kaniovski, A.J. King and R.J.-B. Wets, \Probabilistic bounds(via large deviations) for the solutions of stochastic programming prob-lems," preprint (1993).7. L. Nazareth and R.J.-B. Wets, \Algorithms for stochastic programs:The case of nonstochastic tenders," Mathematical Programming Study28(1986) 1-28.8. L. Nazareth and R.J.-B. Wets, \Nonlinear programming techniquesapplied to stochastic programs with recourse," in: Y. Ermoliev and R.Wets, eds., Numerical Techniques in Stochastic Programming (Spring-Verlag, Berlin, 1988)pp. 95-119.9. H. Niederreiter, \Multidimensional numerical integration using pseudorandom numbers," Mathematical Programming Study 27(1986) 17-38.10. J.S. Pang, \Newton's method for B-di�erentiable equations," Mathe-matics of Operations Research 15(1990) 311-341.11. J.S. Pang and L. Qi, \A globally convergent Newton method for convexSC1 minimization problems", Applied Mathematics Report AMR93/3,School of Mathematics, the University of New South Wales, Sydney,Australia (1993). 23

12. A. Prekopa and R.J.-B. Wets, \Preface of Stochastic Programming 84,"Mathematical Programming Study 28(1986).13. L. Qi, \Convergence analysis of some algorithms for solving nonsmoothequations," Mathematics of Operations Research 18(1993) 227-244.14. L. Qi, \Superlinear convergent approximate Newton methods for LC1optimization problems," to appear in Mathematical Programming.15. L. Qi and R. Womersley, \An SQP algorithm for extended linear-quadratic problems in stochastic programming," Applied Mathemat-ics Preprint AM92/23, School of Mathematics, the University of NewSouth Wales, Sydney, Australia (1992).16. S.M. Robinson, \An implicit-function theorem for a class of nonsmoothequations," Mathematics of Operations Research 16(1991) 292-309.17. R.T. Rockafellar, Convex Analysis, (Princeton University Press, Prince-ton, NJ, 1970).18. R.T. Rockafellar, \Computational schemes for solving large-scale prob-lems in extended linear-quadratic programming," Mathematical Pro-gramming 48(1990) 447-474.19. R.T. Rockafellar and R.J.-B. Wets, \A Lagrangian �nite-generationtechnique for solving linear-quadratic problems in stochastic program-ming," Mathematical Programming Study 28(1986) 63-93.20. R.T. Rockafellar and R.J.-B. Wets, \Linear-quadratic problems withstochastic penalties: the �nite generation algorithm," in: V.I. Arkin,A. Shiraev and R.J-B. Wets, eds., Stochastic Optimization (LectureNotes in Control and Information Sciences 81, Springer-Verlag, Berlin,1987) pp. 545-560.21. I.H. Sloan, \Numerical integration in high dimensions - the lattice ruleapproach," in: T.O. Espelid and A. Genz eds., Numerical Integration(Kluwer, 1992)pp. 55-69.22. P. Tseng, \Applications of a splitting algorithm to decomposition inconvex programming and variational inequalities," SIAM J. Controland Optimization 29(1991) 119-138.24

23. R.J.B. Wets, \Stochastic programming: Solution techniques and ap-proximation schemes," in: A. Bachem, M. Gr�otschel and B. Korteeds., Mathematical Programming, The State of the Art - Bonn 1982(Springer-Verlag, Berlin, 1983)pp. 566-603.24. C. Zhu and R.T. Rockafellar, \Primal-dual projected gradient algo-rithms for extended linear-quadratic programming," to apper in SIAMJ. Optimization.

25

Appendix A: Tables of the vector zHere we give tables of the vector z obtained by Joe and Sloan. These tableswere used in the numerical experiments.Table 4: Recommended choices of the vector z for m2 = 2 2m2 z1249 4996 (512, 1)2503 10012 (672, 1)5003 20012 (1, 1850)10007 40028 (1, 3822)Table 5: Recommended choices of the vector z for m2 = 3 2m2 z619 4952 (233, 436, 1)1249 9992 (1010, 136, 1)2503 20024 (1, 1868, 1025)5003 40024 (1, 2271, 1476)

26

Newton's method for quadratic stochastic programs with recourse

Documents

Transcript of Newton's method for quadratic stochastic programs with recourse