Horizontal and Vertical Decomposition in Interior Point Methods for Linear Programs

Horizontal and Vertical Decompositionin Interior Point Methods for Linear Programs�Masakazu Kojimay Nimrod Megiddoz Shinji Mizunox Susumu Shindoh{Abstract. Corresponding to the linear program:Maximize cTx subject to Ax = a; Bx = b; x � 0;we introduce two functions in the penalty parameter t > 0 and the Lagrange relaxationparameter vector w,~f p(t;w) = maxfcTx�wT (Ax� a) + t nXj=1 ln xj : Bx = b; x > 0g(for horizontal decomposition),~fd(t;w) = minfaTw + bTy � t nXj=1 ln zj : BTy � z = c�ATw; z > 0g(for vertical decomposition).For each t > 0, ~fp(t; �) and ~fd(t; �) are strictly convex C1 functions with a commonminimizer w(t), which converges to an optimal Lagrange multiplier vector w� associ-ated with the constraintAx = a as t! 0, and enjoy the strong self-concordance prop-erty given by Nesterov and Nemirovsky. Based on these facts, we present conceptualalgorithms with the use of Newton's method for tracing the trajectory fw(t) : t > 0g,and analyze their computational complexity.1. Introduction.This paper presents a theoretical framework for incorporating horizontal and vertical de-composition techniques into interior-point methods (see for example the survey paper [13])�Research supported in part by ONR Contract N00014-94-C-0007yDepartment of Information Sciences, Tokyo Institute of Technology, Oh-Okayama, Meguro-ku, Tokyo152, Japan.zIBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, CA 95120, and Schoolof Mathematical Sciences, Tel Aviv University, Tel Aviv, IsraelxThe Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106, Japan.{Department of Mathematics and Physics, The National Defense Academy, Hashirimizu 1-10-20, Yoko-suka, Kanagawa, 239, Japan. 1

for linear programs. Let a 2 Rm, b 2 Rk, c 2 Rn, A 2 Rm�n and B 2 Rk�n. We considerthe following equality form linear program P� and its dual D�:P� Maximize cTx subject to Ax = a; Bx = b; x � 0:D� Minimize aTw + bTy subject to ATw +BTy � z = c; z � 0:Let R++ = ft 2 R : t > 0 g (the set of positive numbers);Rn++ = fx 2 Rn : x > 0 g (the positive orthant);P++ = fx 2 Rn++ : Ax = a; Bx = b g(the interior of the feasible region of P�);D++ = f(w;y;z) 2 Rm �Rk �Rn++ : ATw +BTy � z = c g(the interior of the feasible region of D�);Q++ = fx 2 Rn++ : Bx = b gD++(w) = f(y;z) 2 Rk �Rn++ : BTy � z = c�ATw g:Then we obviously see thatP++ = fx 2 Rn : Ax = ag\Q++;D++ = [w2Rmf(w;y;z) 2 Rm+k+n : (y;z) 2 D++(w)g:Throughout the paper we impose the following assumptions on the constraint set of P�:(A) P++ is nonempty and bounded.(B) Q++ is nonempty and bounded.(C) The (m+ k)� n matrix AB ! has full row rank.For every (t;x;w;y;z) 2 R++ �Rn++ �Rm �Rk �Rn++, de�nefp(t;w;x) = cTx�wT (Ax� a) + t nXj=1 lnxjfd(t;w;y;z) = aTw + bTy � t nXj=1 ln zj :Here t 2 R++ denotes the barrier parameter and w 2 Rm the Lagrange multiplier vectorassociated with the constraint Ax = a.We can regard the horizontal (or vertical) decomposition technique as a numerical tracingof the trajectory which consists of the solution w(t) (t 2 R++) of the following parametricmin-max (or min-min) problem:minimize w2Rm maximize ffp(t;w;x) : x 2 Q++g; (1)minimize w2Rm minimize ffd(t;w;y;z) : (y;z) 2 D++(w)g: (2)2

Under the assumptions (A), (B) and (C), the inner maximization problem in (1) (or theinner minimization problem in (2)) has a unique solution for every (t;w) 2 R++ � Rm, sothat we can consistently de�ne the optimal value functions and the optimizers of the innerproblems; for every (t;w) 2 R++ �Rm, let~fp(t;w) = maxffp(t;w;x) : x 2 Q++g;~x(t;w) = arg max ffp(t;w;x) : x 2 Q++g;~fd(t;w) = minffd(t;w;y;z) : (y;z) 2 D++(w)g;(~y(t;w); ~z(t;w)) = arg min ffd(t;w;y;z) : (y;z) 2 D++(w)g:We are mainly concerned with theoretical aspects of the horizontal and vertical decom-position techniques. In particular we show the following features.(a) For every t 2 R++, the function ~fp(t; �) : Rm ! R is a strictly convex C1 functionwith the unique minimizer w(t) over Rm.(b) For every (t;w) 2 R++ � Rm, we can use the primal-dual interior-point method([5, 7, 8, 9, 11], etc.) to compute (~x(t;w); ~y(t;w); ~z(t;w)), ~fp(t;w), the gradientvector ~f pw(t;w) w.r.t. w and the Hessian matrix ~f pww(t;w) w.r.t. w.(c) The set f(~x(t; w(t)); w(t); ~y(t; w(t)); ~z(t; w(t))) : t 2 R++ g forms the central trajec-tory [9] converging to the analytic center of the optimal solution set of the primal-dualpair of linear programs P� and D�.(d) f ~f p(t; �) : t 2 R++ g is a strongly self-concordant family [10] (with the parameterfunctions �(t) = t; (t) = �(t) = 1; �(t) = pn=t; �(t) = pn=(2t)).(e) ~fd(t;w) � ~f p(t;w) = nt(1 � ln t) for every (t;w) 2 R++ � Rm. Note that the righthand side is independent of w. Hence (a), (b), (c) and (d) above remain valid even ifwe replace ~fp by ~f d.In view of the features (a), (b), (c) and (e) above, we obtain approximate optimalsolutions of P� and D� if we numerically trace the trajectory fw(t) : t > 0g in the w-spaceuntil the barrier parameter t 2 R++ gets su�ciently small. It should be emphasized thatw(t) is the common minimizer of strictly convex C1 functions ~fp(t; �) and ~fd(t; �) over theentire space Rm. Thus we can utilize various unconstrained minimization techniques suchas Newton's, quasi-Newton and conjugate gradient methods to approximate w(t).The feature (d) is a major theoretical contribution of this paper, which makes it possibleto e�ectively utilize Newton's method for tracing the trajectory fw(t) : t > 0g. The notionand the theory of self-concordance were given by Nesterov and Nemirovsky [10] for a wideclass of polynomial-time interior-point methods for convex programming.After listing in Section 2 symbols and notation used throughout the paper, we showin Section 3 that the inner optimization problems in (1) and (2) share a common nec-essary and su�cient optimality condition, and derive the feature (e) from the condition.3

Section 4 describes several derivatives of the function ~f p : R++ � Rm ! R including thegradient ~fpw(t;w) and the positive de�nite Hessian matrix ~fpww(t;w) w.r.t. w 2 Rm. Detailsof calculation of derivatives are given in Appendix. Section 5 establishes that the familyf ~fp(t; �) : t 2 R++ g is strongly self-concordant. In Section 6 we present conceptual decom-position algorithms with the use of Newton's method, and investigate their polynomial-timecomputational complexity.Although we will state a horizontal decomposition for the general linear program P� anda vertical decomposition mainly for its dual D�, it is interesting in practice to apply themto a special case where the matrix B has a block diagonal structure:B = 0BBBB@ B1 O � � � OO B2 � � � O� � �O O � � � B` 1CCCCA : (3)Our horizontal and vertical decomposition techniques applied to such special cases areroughly corresponding to the Dantzig-Wolfe and the Benders decomposition methods, re-spectively. See for example the book [6]. It should be noted that when the matrix B hasa block diagonal structure as in (3), we can decompose the inner maximization problem in(1) (or the inner minimization problem in (2)) into ` smaller subproblems of the same type.Many studies ([1, 2, 12], etc.) have been done on how we exploit special structures suchas generalized upper bounds, block diagonal structures and staircase structures in interior-point methods. The conceptual algorithms given in Section 6 has a close relation witha compact-inverse implementation (see Section 9 of [2]) of the primal-dual interior-pointmethod applied to P� and D�. This is shown in Section 7 where we discuss some issuestoward implementation of the horizontal and vertical decomposition techniques.2. Notation. R++ = ft 2 R : t > 0 g; Rn++ = fx 2 Rn : x > 0 g;P++ = fx 2 Rn++ : Ax = a; Bx = b g;D++ = f(w;y;z) 2 Rm �Rk �Rn++ : ATw +BTy � z = c g;Q++ = fx 2 Rn++ : Bx = b g;D++(w) = f(y;z) 2 Rk �Rn++ : BTy � z = c�ATw g;fp(t;w;x) = cTx�wT (Ax� a) + t nXj=1 lnxj;fd(t;w;y;z) = aTw + bTy � t nXj=1 ln zj;4

~fp(t;w) = maxffp(t;w;x) : x 2 Q++g;~x(t;w) = arg max ffp(t;w;x) : x 2 Q++g;~f d(t;w) = minffd(t;w;y;z) : (y;z) 2 D++(w)g;(~y(t;w); ~z(t;w)) = arg min ffd(t;w;y;z) : (y;z) 2 D++(w)g;~X = ~X(t;w) = diag ~x(t;w); ~Z = ~Z(t;w) = diag ~z(t;w);~� = ~X1=2 ~Z�1=2;w(t) = arg min f ~f p(t;w) : w 2 Rm g;�� = 2� 31=2 = 0:2679 . . . ; �� = 1=4 = 0:25 :3. Duality between ~fp and ~fd.Given t > 0 and w 2 Rm, the term aTw involved in the objective functions fp(t;w; �) of theinner problem in (1) and fd(t;w; �; �) of the inner problem in (2) is constant. Hence we mayeliminate it from the objective functions when we are concerned with the inner problems in(1) and (2). Thus the pair of the inner problems is equivalent to the pair:P(t;w) Maximize (c�ATw)Tx+ tPnj=1 lnxj subject to x 2 Q++:D(t;w) Minimize bTy � tPnj=1 ln zj subject to (y;z) 2 D++(w):This pair has exactly the same structure as the one often studied with the primal-dualinterior-point method ([5, 7, 8, 9, 11], etc.). Therefore we can apply the primal-dual interior-point method to the pair of problems above. By using the well-known argument (see forexample the paper [9]) we obtain the theorem below that states a common necessary andsu�cient optimality condition for the inner maximization problem in (1) (i.e., P(t;w))and the inner minimization problem in (2) (i.e., D(t;w)).Theorem 3.1. Let (t;w) 2 R++ � Rm. Then x 2 Rn is a maximizer of fp(t;w; �) overQ++ and (y;z) 2 Rk+n is a minimizer of fd(t;w; �; �) over D++(w) if and only ifBx� b = 0; x > 0;ATw +BTy � z � c = 0; z > 0;Xz � te = 0: 9>=>; (4)Here X denotes the diagonal matrix of the coordinates of x 2 Rn and e = (1; . . . ; 1)T 2 Rn.Let (t;w) 2 R++ �Rm. Suppose that (x;y;z) satis�es (4). Then (w;y;z) is a feasiblesolution of the original dual linear program D�. Hence aTw + bTy gives an upper boundof the maximal objective value of the original primal linear program P� to be solved. Onthe other hand, x is not a feasible solution of P� unless the additional equality Ax�a = 05

holds. It will be shown in Section 4 that the equality holds when and only when w is aminimizer of the outer problem in (1), i.e., a minimizer of ~fp(t; �) over Rm.Recall that ~x(t;w) and (~y(t;w); ~z(t;w)) denote the maximizer of the inner problem in(1) and the minimizer of the inner problem in (2), respectively. For every �xed w 2 Rm, theset f(~x(t;w); ~y(t;w); ~z(t;w)) : t 2 R++ g forms the central trajectory of the primal-dualpair of linear programs:P(w) Maximize (c�ATw)Tx subject to Bx = b; x � 0:D(w) Minimize bTy subject to BTy � z = c�ATw; z � 0:Theorem 3.1 implies that the relationsB~x(t;w)� b = 0; ~x(t;w) > 0;ATw +BT ~y(t;w)� ~z(t;w)� c = 0; ~z(t;w) > 0;~X(t;w)~z(t;w)� te = 0 9>=>; (5)hold for every (t;w) 2 R++ �Rm. It follows that~f d(t;w)� ~fp(t;w)= bT ~y(t;w)� cT ~x(t;w) +wTA~x(t;w)� t nXj=1 ln ~xj(t;w)~zj(t;w)= �ATw +BT ~y(t;w)� c�T ~x(t;w)� t nXj=1 ln t= ~x(t;w)T ~z(t;w)� nt ln t = nt(1 � ln t):Therefore we obtain:Theorem 3.2. ~fd(t;w)� ~fp(t;w) = nt(1� ln t) for every (t;w) 2 R++ �Rm:4. Derivatives.In this section, we present various derivatives of ~f p : R++�Rm ! R. Details of calculationof derivatives are omitted here but are given in Appendix.Theorem 4.1.(i) ~fpw(t;w) = a�A~x(t;w) for every (t;w) 2 R++ �Rm.(ii) ~fpww(t;w) = A ~� �I � ~�BT (B ~�2BT )�1B ~�� ~�AT= 1tA ~X �I � ~XBT (B ~X2BT )�1B ~X� ~XATfor every (t;w) 2 R++ �Rm. 6

Here ~� = ~X1=2 ~Z�1=2, ~X = ~X(t;w) = diag ~x(t;w) and ~Z = ~Z(t;w) = diag ~z(t;w).As a corollary we obtain:Theorem 4.2. Let t 2 R++ be �xed arbitrarily.(i) The Hessian matrix ~fpww(t;w) is positive de�nite for every w 2 Rm.(ii) The function ~f p(t; �) : Rm ! R is a strictly convex C1 function with the uniqueminimizer over Rm.Proof: (i) Obviously the Hessian matrix~f pww(t;w) = A ~� �I � ~�BT (B ~�2BT )�1B ~�� ~�ATis positive semi-de�nite. Hence it su�ces to show that the matrix is nonsingular. Assumeon the contrary that A ~� �I � ~�BT (B ~�2BT )�1B ~�� ~�ATu = 0for some nonzero u 2 Rm. Since �I � ~�BT (B ~�2BT )�1B ~�� is a projection matrix, wehave �I � ~�BT (B ~�2BT )�1B ~�� ~�ATu = 0;which implies that ATu�BT (B ~�2BT )�1B ~�2ATu = 0:This contradicts the assumption (C).(ii) The strict convexity of ~f p(t; �) follows directly from (i). We have observed that themappings ~x; ~y; ~z on R++ �Rm are de�ned implicitly through the equalities in (4) (see(5)). Since the Jacobian matrix of the left hand side of the equalities with respect to(x;y;z) is nonsingular at every (t;w;x;y;z) 2 R++ �Rm �Rn++ � Rk � Rn++ and theleft hand side is C1 di�erentiable with respect to (t;w;x;y;z), we see by the implicitfunction theorem (for example [3]) that the mappings ~x; ~y; ~z are C1 di�erentiable withrespect to (t;w). It is well-known [9] that under the assumptions (A)and (C), there existsa unique (w;x;y;z) for whichAx� a = 0; Bx� b = 0; x > 0;ATw +BTy � z � c = 0; z > 0;Xz � te = 0 9>=>; (6)hold. We know by Theorems 3.1 and 4.1 that (6) is a necessary and su�cient conditionfor w 2 Rm to be a minimizer of ~fp(t; �) over Rm. This ensures the existence and theunique minimizer of ~f p(t; �). 7

For every t 2 R++, de�ne w(t) to be the minimizer of ~fp(t; �) over Rm whose existenceand uniqueness have been shown above; i.e., w(t) = arg min f ~fp(t;w) : w 2 Rmg: In viewof Theorem 3.2, we see that w(t) = arg min f ~fd(t;w) : w 2 Rmg: Furthermore, for everyt 2 R++, (x;w;y;z) = (~x(t; w(t)); w(t); ~y(t; w(t)); ~z(t; w(t))) satis�es (6). Thus the setf(~x(t; w(t)); w(t); ~y(t; w(t)); ~z(t; w(t))) : t 2 R++ g forms the central trajectory convergingto the analytic center of the optimal solution set of the primal-dual pair of linear programsP� and D�.The theorem below will be utilized in Section 5 where we discuss the self-concordantproperty of the family f ~fp(t; �) : t 2 R++ g of functions.Theorem 4.3. Let w 2 Rm and h 2 Rm be �xed arbitrarily. For every (t; s) 2 R++ � R,let X(t; s) = ~X(t;w + sh);u(t; s) = �I �X(t; s)BT (BX(t; s)2BT )�1BX(t; s)�X(t; s)ATh;v(t; s) = �I �X(t; s)BT (BX(t; s)2BT )�1BX(t; s)�e;Then the following (i) { (v) hold for every s 2 R.(i) d ~f p(t;w + sh)ds = (a�A~x(t;w + sh))T h(ii) d2 ~fp(t;w + sh)ds2 = 1t nXj=1uj(t; s)2.(iii) d3 ~fp(t;w + sh)ds3 = � 2t2 nXj=1uj(t; s)3.(iv) d2 ~fp(t;w + sh)dtds = �u(t; s)Tv(t; s)t .(v) d3 ~fp(t;w + sh)dtds2 = u(t; s)T (2diag v(t; s)� I)u(t; s)t2 .5. Self-Concordance.In this section, we apply the notion and the theory of self-concordance to the family f ~f p(t; �) :t 2 R++g of functions according to the paper [10] by Nesterov and Nemirovsky. To avoidcomplicated notation and discussion, however, we employ a simpli�ed version of the self-concordance. Our de�nition of the \global self-concordance" and related properties shownbelow are less general than those of the original strong self-concordance given in the paper8

[10], but can be adapted directly to the family f ~f p(t; �) : t 2 R++g. The readers who arefamiliar to the strong self-concordance can easily see that if a family fF (t; �) : t 2 R++gof functions on Rm is globally self-concordant then F = fRm; F (t; �); Rmgt2R++ is stronglyself-concordant.De�nition 5.1. A function F on Rm is globally self-concordant with the parameter value� > 0 if it satis�es:(a) F is a strictly convex C3 function on Rm with a positive de�nite Hessian matrixFww(w) at every w 2 Rm.(b) There is a unique global minimizer of F over Rm.(c) For every w 2 Rm and h 2 Rm,�� d3F (w + sh)ds3 ��s=0�� 2��1=2 d2F (w + sh)ds2 ��s=0!3=2De�nition 5.2. A family fF (t; �) : t 2 R++g of functions on Rm is globally self-concordantwith the positive parameter functions �; � and � if it satis�es:(d) �; �; � : R++ ! R++ are continuous functions.(e) For every t 2 R++, F (t; �) is a globally self-concordant function on Rm with theparameter value �(t).(f) F (t;w), Fw(t;w) and Fww(t;w) are di�erentiable in t 2 R++ and the resultant deriva-tives in t 2 R++ are continuous in (t;w) 2 R++ �Rm.(g) For every (t;w) 2 R++ �Rm and h 2 Rm,�� d2F (t;w + sh)dtds ��s=0�� (t)�(t)1=2 d2F (t;w+ sh)ds2 ��s=0!1=2 :(h) For every (t;w) 2 R++ �Rm and h 2 Rm,�� d3F (t;w+ sh)dtds2 ��s=0�� 2�(t) d2F (t;w + sh)ds2 ��s=0! :Let fF (t; �) : t 2 R++g be a globally self-concordant family of functions on Rm with thepositive parameter functions �; �; � : R++ ! R++. For every t 2 R++, de�ne Newton'sdecrement of the function F (t; �) at w 2 Rm by�(t;w) = inff� : jFw(t;w)Thj � ��(t)1=2 �hTFww(t;w)h�1=2 for every h 2 Rmg:9

Or alternatively, �(t;w) is de�ned by�(t;w)2 = 2 F (t;w)� inff�(t;u) : u 2 Rmg�(t) != 1�(t)Fw(t;w)TFww(t;w)�1Fw(t;w);where �(t; �) is a quadratic approximation of the function F (t; �) at w 2 Rm;�(t;u) = F (t;w) + Fw(t;w)T (u�w) + 12(u�w)TFww(t;w)(u�w)for every (t;u) 2 R++ �Rm:Newton's decrement is a continuous function from R++ � Rm into the set of nonnegativenumbers such that for each t 2 R++, �(t; �) takes the minimal value 0 over Rm at w 2 Rmif and only if w is the unique global minimizer of F (t; �) over Rm. Thus, for every � 2 R++,the set N(�) = f(t;w) 2 R++ �Rm : �(t;w) � �gforms a closed neighborhood of the trajectoryf(t;w) 2 R++ �Rm : �(t;w) = 0g= f(t;w) 2 R++ �Rm : w = arg min fF (t;w) : w 2 Rmg g ;which we want to trace numerically. Let � 2 R++. De�ne the metric �� on R++ by��(t00; t0) = 12 maxfj ln(�(� 0)=�(� 00))j : � 00; � 0 2 [t00; t0]g+ ��1 ��Z t0t00 �(s)ds�� + ��Z t0t00 �(s)ds��for every t0; t00 2 R++:Let �� = 2 � 31=2 = 0:2679 � � � . De�ne�(�) = ( (1 + �)�1 if � > �� ;1 if � � �� ;!(�) = 1 � (1� 3�)1=3:Theorem 5.3. Assume that fF (t; �) : t 2 R++g is a globally self-concordant family offunctions on Rm with the positive parameter functions �; �; � : R++ ! R++. Let t0 2 R++,w0 2 Rm and �0 = �(t0;w0). Let �w 2 Rm be the Newton iterate at w0 with the step length�(�0); �w = w0 � �(�0)Fww(t0;w0)�1Fw(t0;w0):10

(i) If �0 > �� thenF (t0; �w)� F (t0;w0) � ��(t0) (�� ln(1 + ��)) � �0:03�(t0):(ii) If �0 � �� then�(t0; �w) � (�0)2(1� �0)2 8>>><>>>: = �02 if �0 = ��;< �02 if �0 < ��F (t0;w0)�minfF (t0;w) : w 2 Rmg � 12�(t0)!(�0)21 + !(�0)1 � !(�0) � 4�0�(t0):(iii) If �0 < � < �� and ��(t00; t0) � ��1(� � �0) then �(t00;w0) � �.Proof: See Section 1 of the paper [10]We have already seen that:(a)' For every t 2 R++, the function ~f p(t; �) : Rm ! R is a strictly convex C1 functionwith a positive de�nite Hessian matrix ~fpww(t;w) at every w 2 Rm.(b)' For every t 2 R++, there is a unique global minimizer of ~fp(t; �) over Rm.(f)' ~fp(�; �) is C1 di�erentiable on R++ �Rm.Hence the following theorem is su�cient to establish that the family f ~fp(t; �) : t 2 R++g offunctions on Rm is globally self-concordant.Theorem 5.4.(d)' De�ne �(t) = t ; �(t) = pnt and �(t) = pn2t for every t 2 R++:Then the following relations hold for every (t;w) 2 R++ �Rm.(c)' �� d3 ~f p(t;w + sh)ds3 ��s=0�� 2�(t)�1=2 d2 ~f p(t;w + sh)ds2 ��s=0!3=2.(g)' �� d2 ~f p(t;w + sh)dtds ��s=0�� (t)�(t)1=2 d2 ~fp(t;w + sh)ds2 ��s=0!1=2.(h)' �� d3 ~fp(t;w + sh)dtds2 ��s=0�� 2�(t) d2 ~fp(t;w + sh)ds2 ��s=0!.11

Proof: (c)' By Theorem 4.3, we have that d3 ~fp(t;w + sh)ds3 !2 = 0@ 2t2 nXj=1uj(t; s)31A2= 4t4 0@ nXj=1uj(t; s)31A2� 4t4 0@ nXj=1uj(t; s)21A3= 4t d2 ~f p(t;w + sh)ds2 !3 :Thus (c)' follows.(g)' It follows from the de�nition of v(t; s) that jv(t; s)j � pn. Hence we see byTheorem 4.3 that��d2 ~fp(t;w + sh)dtds �� = ��u(t; s)Tv(t; s)t �� pnku(t; s)kt= �(t)�(t)1=2 d2 ~f p(t;w + sh)ds2 !1=2 :Thus we have shown (g)'.(h)' It is easily veri�ed that�pn � 2vi(t;w)� 1 � pn for i = 1; 2; . . . ; n: (7)Hence we see by Theorem 4.3 that��d3 ~fp(t;w + sh)dtds2 �� pnt2 u(t; s)Tu(t; s)= pnt d2 ~fp(t;w + sh)ds2= 2�(t)d2 ~fp(t;w + sh)ds2 :Thus we have shown (h)'.Now we are ready to apply Theorem 5.4 to the family f ~f p(t; �) : t 2 R++g. Newton'sdecrement turns out to be�(t;w) 12

= inff� : �� d ~f p(t;w + sh)ds ��s=s�� (t)1=2 d2 ~fp(t;w + sh)ds2 ��s=s!1=2 ; 8h 2 Rm g= inff� : ��(a�A~x(t;w))Th�� hTA ~X �I � ~XBT (B ~X2BT )�1B ~X� ~XATh�1=2 ; 8h 2 Rm g= inff� : ��(A~x(t;w)� a)Th�� t1=2 �hTA ~� �I � ~�BT (B ~�2BT )�1B ~�� ~�ATh�1=2 ; 8h 2 Rm g:Here ~X = ~X(t;w) and ~� = ~X(t;w)1=2 ~Z(t;w)�1=2. Let 0 < � and 0 < t00 < t0. Then��(t00; t0) = 12 maxfjln(�(� 00)=�(� 0))j : � 00; � 0 2 [t00; t0] g+��1 ��Z t0t00 �(s)ds��+ ��Z t0t00 �(s)ds��= 12 maxfjln(� 00=� 0)j : � 00; � 0 2 [t00; t0] g+��1 ��Z t0t00 pns ds��+ ��Z t0t00 pn2s ds��= 12 ln(t0=t00) + pn� (ln t0 � ln t00) + pn2 (ln t0 � ln t00)= 12 + pn� + pn2 ! ln(t0=t00):Thus we obtain ��(t00; t0) = 12 + pn� + pn2 ! ln(t0=t00): (8)Theorem 5.5. Let �� = 1=4 < �� = 2 � 31=2 = 0:2679 . . . . Let (t0;w0) 2 R++ � Rm and�0 = �(t0;w0). Let �w 2 Rm be the Newton iterate at w0 with the step length �(�0);�w = w0 � �(�0) ~fpww(t0;w0)�1 ~fpw(t0;w0):(i) If �0 > �� then ~f p(t0; �w)� ~f p(t0;w0) � �t0 (�� ln(1 + ��)) � �0:03t0:(ii) If �0 � �� then�(t0; �w) � (�0)2(1 � �0)2 8>>><>>>: = �02 if �0 = ��;< �02 if �0 < ��;~fp(t0;w0)�minf ~fp(t0;w) : w 2 Rmg � 12t0!(�0)21 + !(�0)1 � !(�0) � 4�0t0:13

(iii) If �0 � ��=2 and (1� 1=(11pn))t0 � t00 � t0 then �(t00;w0) � ��.Proof: The assertions (i) and (ii) follow directly from Theorems 5.3 and 5.4, so that itsu�ces to show (iii). Assume that �0 � ��=2 and (1 � 1=(11pn))t0 � t00 � t0: Then wesee by (8) that ��(t00; t0) = 12 + pn�� + pn2 ! ln(t0=t00)� 5pn ln 11 � 1=(11pn)� 5pn 110pn = 12 � ��1� (�� 0):Thus the desired result follows from (iii) of Theorem 5.3.6. Conceptual Algorithms.In addition to the assumptions (A), (B) and (C) stated in the Introduction, we assume:(D) For any (t;w) 2 R++�Rm, we can compute the exact maximizer ~x(t;w) of ~fp(t;w; �)over Q++ and the exact minimizer (~y(t;w); ~z(t;w)) of ~f d(t;w; �; �) over D++(w).We describe two algorithms based on Theorem 5.5. For each �xed t 2 R++, Algorithm6.1 approximates the minimizer w(t) of ~f p(t;w) over Rm, while Algorithm 6.2 numericallytraces the trajectory f(t; w(t)) : t 2 R++g from a given initial point (t0;w0) in the neigh-borhood N(��=2) = f(t;w) : �(t;w) � ��=2g of the trajectory f(t; w(t)) : t 2 R++g, where�� = 1=4.Let (t;w) 2 R++ �Rm.Algorithm 6.1.Step 0: Let q = 0 and wq = w.Step 1: Compute (xq;yq;zq) = (~x(t;wq); ~y(t;wq); ~z(t;wq)). Let �q = �(t;wq).Step 2: Compute wq+1 = wq � �(�q) ~f pww(t;wq)�1 ~fpw(t;wq):Step 3: Replace q by q + 1, and go to Step 1.Suppose (t0;w0) 2 N(��=2). where �� = 1=4.Algorithm 6.2.Step 0: Let � = 1=(11pn) and r = 0.Step 1: Compute (xr;yr;zr) = (~x(tr;wr); ~y(tr;wr); ~z(tr;wr)). Let �r = �(tr;wr).Step 2: Let tr+1 = (1 � �)tr. Compute wr+1 = wr � ~f pww(tr+1;wr)�1 ~fpw(tr+1;wr):14

Step 3: Replace r by r + 1, and go to Step 1.In view of Theorem 5.5, we see that:(i) The sequence f(t;wq)g generated by Algorithm 6.1 moves into the neighborhoodN(��=2) = f(t;w) : �(t;w) � ��=2gof the trajectory f(t; w(t)) : t 2 R++g in a �nite number of iterations, and eventuallyconverges to w(t). Hence Algorithm 6.1 provides us with an initial point of Algorithm6.2.(ii) The sequence f(tr;wr)g generated by Algorithm 6.2 runs in the neighborhoodN(��=2) = f(t;w) : �(t;w) � ��=2gand satis�es that tr+1 = 1� 111pn! tr for every r = 0; 1; 2; . . . ; (9)hence limr!1 tr = 0. The sequence f(xr;wr;yr;zr)g lies in a neighborhood ofthe central trajectory of the primal-dual pair of linear programs P� and D�, and if(x�;w�;y�;z�) is a limiting point of the sequence then x� and (w�;y�;z�) are optimalsolutions of P� and D�, respectively.We now assume that all elements of the data A, B, a, b and c are integers to analyzecomputational complexity of Algorithms 6.1 and 6.2 in detail. Let L denote the input sizeof P�.Lemma 6.3. Let t > 0.(i) ~fp(t;0) � n22L + tnL:(ii) minw2Rm ~fp(t;w) � �n22L � tnL:Proof: By (A) and (B) assumed in the Introduction and by the de�nition of L, we knowthat xi � 2L � expL for every i = 1; 2; . . . ; n and every x 2 Q++;�xi � 2�L � exp(�L) for every i = 1; 2; . . . ; n and some �x 2 P++:It follows that fp(t;0;x) = cTx+ t nXj=1 lnxj15

� n(2L)(2L) + tnL= n22L + tnL for every x 2 Q++;fp(t;w; �x) = cT �x+ t nXj=1 ln �xj� �n(2L)(2L) + tn(�L)= �n22L � tnL for every w 2 Rm:Hence ~f p(t;0) = maxffp(t;0;x) : x 2 Q++g � n22L + tnL;minw2Rm ~fp(t;w) = minw2Rmmaxffp(t;w;x) : x 2 Q++g� minw2Rm fp(t;w; �x)� �n22L � tnL:Theorem 6.4.(i) Let t = 22L and w0 = 0. Then Algorithm 6.1 generates in O(nL) iterations a (t;wq) 2N(��=2).(ii) Let � > 0. Then Algorithm 6.2 generates in O(pn ln t0=�) iterations a (tr;wr) 2N(��=2) such that 0 < tr < �.Proof: Let q� = d4nL=0:03e + 1. Assume on the contrary that(t;wq) 62 N(��) for q = 0; 1; 2; . . . ; q�:By Theorems 6.3 and 5.5 we then see that�n22L � tnL � ~fp(t;wq�)� ~fp(t;w0)� 0:03t� q�� n22L + tnL� 4nLt� n22L � 2nL � 22L � tnL < �n22L � tnL:This is a contradiction. Hence (t;wq) 2 N(��) for some q � q�: By Theorem 5.5, wesee that (t;wq+2) 2 N(��=2). Thus we have shown (i). The assertion (ii) follows directlyfrom the inequality (9). 16

7. Toward Practical Implementation.Among the assumptions imposed in our discussions so far, the most impractical one seemsto be the assumption:(D) For any (t;w) 2 R++�Rm, we can compute the exact maximizer ~x(t;w) of ~fp(t;w; �)over Q++ and the exact minimizer (~y(t;w); ~z(t;w)) of ~f d(t;w; �; �) over D++(w).For every (t;w) 2 R++ � Rm, the Newton direction (dx;dy;dz) toward a point on thecentral trajectory of the primal-dual pair of P(w) and D(w) gives a search direction in the(x;y;z)-space, which was utilized in many primal-dual feasible- and infeasible-interior-pointmethods ([5, 7, 8, 9, 11], etc.); (dx;dy;dz) is given byBdx = �(Bx� b);BTdy � dz = �(ATw +BTy � z � c);Zdx+Xdz = �(Xz � te): 9>=>; (10)In our framework, we have paid little attention to such search directions but assumed thatthe point (~x(t;w); ~y(t;w); ~z(t;w)) into which all the ow induced by the search directionsruns is available.To develop practical computational methods, we need to weaken the assumption (D); atleast we need to replace \exact" by \approximate" in (D). In Algorithms 6.1 and 6.2, theexact optimizers x = ~x(t;w) and (y;z) = (~y(t;w); ~z(t;w)) are necessary to compute thegradient ~fpw(t;w) = a�Ax and the Hessian matrix~fpww(t;w) = A� �I ��BT (B�2BT )�1B��ATwith which we perform one Newton iteration to generate a new point in the w-space;w + �(�(t;w))dw;where Hdw = Ax� a; (11)H = A� �I ��BTG�1B��AT ;G = B�2BT ;� = (diag x)1=2 (diag z)�1=2 :It should be noted, however, that even when the exact optimizers are unavailable, thesearch direction dw above is well-de�ned for every pair of x > 0 and z > 0. Thus, forevery (t;x;y;z) 2 R++ �Rn++ � Rk �Rn++, the correspondence w ! dw de�nes a searchdirection in the w-space. 17

Another critical question to be addressed is whether solving the system (11) to computea search direction dw in the w-space is essentially easier than one iteration of the primal-dual interior-point method applied to the original pair of P� and D� without using thedecomposition techniques. To compute dw by solving (11), we need(i) a construction of the matrix H = A� �I ��BTG�1B��AT ; which requires an(implicit) inversion of the matrix G = B�2BT ,(ii) a solution u of a system of equations Hdw = s for some s 2 Rm.On the other hand, given a current iterate (x;w;y;z) 2 Rn++ � Rm � Rk � Rn++, thestandard primal-dual interior-point method solves the system of equations in the searchdirection (dx;dw;dy;dz):Adx = �(Ax� a); Bdx = �(Bx� b);ATdw +BTdy � dz = �(ATw +BTy � z � c);Xdz +Zdx = �(Xz � te); 9>=>; (12)where t � 0. We can easily verify that the search direction (dx;dw;dy;dz) can be repre-sented as Hdw = Ax� a�A�2BTG�1(Bx� b)�A� �I ��BTG�1B�� (r + s);Gdy = Bx� b�B� ��ATdw + r + s� ;�dz = �ATdw +�BTdy + r;��1dx = ��dz � s; 9>>>>>>>=>>>>>>>; (13)where r = �(ATw +BTy � z � c) and s =X�1=2Z�1=2(Xz � te):This representation (13) of the solution (dx;dw;dy;dz) of (12) is called a compact-inverseimplementation in the literature [2]. Hence if we utilize the compact-inverse implementation(13), the majority of the computation of (dx;dw;dy;dz) are also spent for (i) and (ii).Therefore we are required almost the same amount of computational work in solving (11)to compute a search direction dw as in one iteration of the compact-inverse implementationof the primal-dual interior-point method applied to the original pair of P� and D� withoutusing the decomposition techniques. This observation is ironic to our nice theoretical resultswith the use of Newton's method presented so far, but sheds a new light on the compact-inverse implementation. In particular, we observe that when x = ~x(t;w); y = ~y(t;w) andz = ~z(t;w), (13) turns out to beHdw = Ax� a;Gdy = �B�2ATdw;�dz = �ATdw +�BTdy;��1dx = ��dz; 9>>>>=>>>>;18

hence dw determined by the compact-inverse implementation (13) is exactly the same asthe search direction dw in the w-space determined by (11).We conclude the paper by a general idea of computational methods consisting of thefollowing three steps:Step xyz: Given (t;w) 2 R++�Rm and (x;y;z) 2 Rn++�Rk�Rn++, compute a new iterate(x0;y0;z0) by x0 = x + �pdx and (y0;z0) = (y;z) + �d(dy;dz); where (dx;dy;dz)is given by (10) and �p; �d � 0 are step lengths.Step w: Given t > 0, (x;y;z) 2 Rn++ �Rk �Rn++ and w 2 Rm, choose a search directiondw in the w-space and generate a new iterate w0 by w0 = w + �dw; where � > 0 isa step length.Step t: Decrease t.Algorithms 6.1 and 6.2 may be regarded as ideal cases of such methods in which we alwaysperform in�nitely many iterations of Step xyz to compute the exact optimizers ~x(t;w) offp(t;w; �) and (~y(t;w); ~z(t;w)) of fd(t;w; �; �) before we perform Step w with the searchdirection dw determined by (11).Acknowledgment. We would like to thank an anonymous referee of the paper [4] whichhad been submitted to Operations Research Letters. Although the paper was not acceptedfor publication, the referee argued some interesting and signi�cant points. In particular, thereferee suggested us to see whether a real valued functionf(w) = maxfwT (Bx� b) + nXj=1 lnxj : eTx = 1; x > 0gon Rm is self-concordant. This question motivated the study of the subject of the currentpaper. References[1] J. R. Birge and L. Qi, \Computing block-angular Karmarkar projections with appli-cations to stochastic programming," Management Science 34 (1988) 1472{1479.[2] I. C. Choi and F. Goldfarb, \Exploiting special structure in a primal-dual path-following algorithm," Mathematical Programming 58 (1993) 33{52.[3] M. W. Hirsch and S. Smale, Di�erential Equations, Dynamical Systems, and LinearAlgebra, (Academic Press, New York, 1974).[4] M. Kojima, N. Megiddo and S. Mizuno, \A Lagrangian Relaxation Method for Ap-proximating the Analytic Center of a Polytope," Technical Report, IBM AlmadenResearch Center, San Jose, CA 95120{6099, USA, 1992.19

[5] M. Kojima, S. Mizuno and A. Yoshise, \A primal-dual interior point algorithm forlinear programming," In N. Megiddo, ed., Progress in Mathematical Programming,Interior-Point and Related Methods (Springer-Verlag, New York, 1989) 29{47.[6] L. S. Lasdon, Optimization Theory for Large Systems, (Macmillan, New York, 1970).[7] I. J. Lustig, \Feasibility issues in a primal-dual interior-point method for linear pro-gramming," Mathematical Programming 49 (1990/91) 145{162.[8] R. Marsten, R. Subramanian, M. Saltzman, I. J. Lustig and D. Shanno, \Interiorpoint methods for linear programming: Just call Newton, Lagrange, and Fiacco andMcCormick!," Interfaces 20 (1990) 105{116.[9] N. Megiddo, \Pathways to the optimal set in linear programming," In N. Megiddo, ed.,Progress in Mathematical Programming, Interior-Point and Related Methods (Springer-Verlag, New York, 1989) 131{158.[10] Ju. E. Nesterov and A. S. Nemirovsky, \Self-concordant functions and polynomial-timemethods in convex programming," Report, Central Economical and MathematicalInstitute, USSR Acad. Sci. (Moscow, USSR, 1989).[11] K. Tanabe, \Centered Newton method for mathematical programming," In M. Iri andK. Yajima, eds., System Modeling and Optimization (Springer-Verlag, New York, 1988)197{206.[12] M. J .Todd, \Exploiting special structure in Karmarkar's linear programming algo-rithm," Mathematical Programming 41 (1988) 97{113.[13] M. J .Todd, \Recent developments and new directions in linear programming," InM. Iri and K. Tanabe, eds., Recent Developments and Applications (Kluwer AcademicPublishers, London, 1989) 109{157.Appendix. Calculation of Derivatives of ~fp .Di�erentiating the identities in (5) by w 2 Rm, we �rst observeB~xw(t;w) = O; AT +BT ~yw(t;w)� ~zw(t;w) = O;~Z(t;w)~xw(t;w) + ~X(t;w)~zw(t;w) = O ) (14)for every (t;w) 2 R++ �Rm. Di�erentiating the identities in (5) by t 2 R++, we also seeB~xt(t;w) = O; BT ~yt(t;w)� ~zt(t;w) = O;~Z(t;w)~xt(t;w) + ~X(t;w)~zt(t;w) = e ) (15)for every (t;w) 2 R++ � Rm. By solving (14) in ~xw(t;w), ~yw(t;w), ~zw(t;w) and (15) in~xt(t;w), ~yt(t;w), ~zt(t;w), we obtain the following lemma.Lemma A.(i) For every (t;w) 2 R++ �Rm,~xw(t;w) = �1t ~X �I � ~XBT (B ~X2BT )�1B ~X� ~XAT ;20

~yw(t;w) = �(B ~X2BT )�1B ~X2AT ;~zw(t;w) = ~X�1 �I � ~XBT (B ~X2BT )�1B ~X� ~XAT :(ii) For every (t;w) 2 R++ �Rm,~xt(t;w) = 1t ~X �I � ~XBT (B ~X2BT )�1B ~X�e;~yt(t;w) = (B ~X2BT )�1B ~Xe;~zt(t;w) = BT (B ~X2BT )�1B ~Xe:Here ~X = ~X(t;w) and ~Z = ~Z(t;w).Proof: (i) It follows from the second and third identities of (14) that~xw(t;w) = � ~X ~Z�1~zw(t;w) = � ~X ~Z�1 �AT +BT ~yw(t;w)�Hence, by the �rst identity of (14),O = B~xw(t;w) = �B ~X ~Z�1 �AT +BT ~yw(t;w)� :Therefore we obtain that~yw(t;w) = �(B ~X ~Z�1BT )�1B ~X ~Z�1AT= �(B ~X2BT )�1B ~X2AT (since ~X ~Z = tI);~zw(t;w) = AT +BT ~yw(t;w)= AT �BT (B ~X2BT )�1B ~X2AT= ~X�1 �I � ~XBT (B ~X2BT )�1B ~X� ~XAT ;~xw(t;w) = � ~X ~Z�1~zw(t;w)= �1t ~X �I � ~XBT (B ~X2BT )�1B ~X� ~XAT :(ii) In view of the the second and third identities of (15), we see that~xt(t;w) = ~Z�1e� ~X ~Z�1~zt(t;w) = ~Z�1e� ~X ~Z�1BT ~yt(t;w):Hence, by the �rst identity of (15),O = B~xt(t;w) = B ~Z�1e�B ~X ~Z�1BT ~yt(t;w):Therefore we obtain~yt(t;w) = (B ~X ~Z�1BT )�1B ~Z�1e = (B ~X2BT )�1B ~Xe;~zt(t;w) = BT (B ~X2BT )�1B ~Xe;~xt(t;w) = ~Z�1e� ~X ~Z�1~zt(t;w) = 1t ~X �I � ~XBT (B ~X2BT )�1B ~X�e:21

Proof of Theorem 4.1:By the de�nition,~fpw(t;w) = ~xw(t;w)Tfpx (t;w; ~x(t;w))� (A~x(t;w)� a)= ~xw(t;w)T (BT ~y(t;w))� (A~x(t;w)� a)= (B~xw(t;w))T ~y(t;w)� (A~x(t;w)� a)(since B~xw(t;w) = 0 by (14))= a�A~x(t;w):Thus we have shown (i). By the de�nition and (i), we see that~fpww(t;w) = �A~xw(t;w):Hence we obtain by (i) of Lemma A and ~X = pt ~� that~fpww(t;w) = 1tA ~X �I � ~XBT (B ~X2BT )�1B ~X� ~XAT= A ~� �I � ~�BT (B ~�2BT )�1B ~�� ~�AT :This completes the proof of Theorem 4.1.Proof of Theorem 4.3:For simplicity of notation, we usex(t; s) to denote ~x(t;w + sh);X to denote X(t; s) = ~X(t;w + sh);G to denote BX2BT ;P to denote I �XBTG�1BX;respectively.(i) By (i) of Theorem 4.1,d ~f p(t;w + sh)ds = ~f pw(t;w + sh)Th = (a�Ax(t; s))T h:Thus we have shown (i). 22

(ii) By (ii) of Theorem 4.1,d2 ~f p(t;w + sh)ds2 = hT ~fpww(t;w + sh)h= 1thTAXPXATh = 1tu(t; s)Tu(t; s):Thus we have shown (ii).(iii) We �rst observe thatdx(t; s)ds = d~x(t;w + sh)ds= ~xw(t;w + sh)h= �1tXPXATh = �1tXu(t; s):On the other hand, we see by (ii) thatd3 ~fp(t;w + sh)ds3 = 2tu(t; s)T du(t; s)ds :To evaluate du(t; s)ds , letq(t; s) =XATh and r(t; s) = G�1BX2ATh:Then Gr(t; s) = BX2ATh;u(t; s) = PXATh = q(t; s)�XBTr(t; s): (16)It follows that dq(t; s)ds =XsATh;2BXXsBTr(t; s) +BX2BT dr(t; s)ds = 2BXXsATh:Here Xs = diag dx(t; s)ds : HenceGdr(t; s)ds = 2BXXsATh � 2BXXsBTr(t; s)= 2BXXsATh � 2BXXsBTG�1BX2ATh= 2BXsPXATh; 23

dr(t; s)ds = 2G�1BXsPXATh;du(t; s)ds = dq(t; s)ds �X sBTr(t; s)�XBT dr(t; s)ds= XsATh �X sBTG�1BX2ATh� 2XBTG�1BXsPXATh= �I � 2XBTG�1BX�X�1XsPXATh:Thereforetd3 ~fp(t;w + sh)ds3 = 2u(t; s)T du(t; s)ds= 2hTAXP �I � 2XBTG�1BX�X�1XsPXATh= 2hTAXPX�1XsPXATh= �2u(t; s)T �1t diag u(t; s)�u(t; s)= �2t nXj=1 uj(t; s)3:Thus we have shown (iii).(iv) By (i), we see thatd2 ~f p(t;w + sh)dtds = d((a�Ax(t; s))Th)dt= �hTA~xt(t;w + sh)= �1thTAXPe= �1tu(t; s)Tv(t; s):(v) By (ii), we see thatd3 ~fp(t;w + sh)dtds2 = ddt (u(t; s)Tu(t; s)t )= 2tu(t; s)T du(t; s)dt � u(t; s)Tu(t; s)t2 :Using q(t; s) and r(t; s), we representu(t; s) as in (16) to evaluate du(t; s)dt . By the de�nitionof q(t; s) and r(t; s), we see thatdq(t; s)dt =X tATh;2BXX tBTr(t; s) +BX2BT dr(t; s)dt = 2BXX tATh:24

Here X t = diag dx(t; s)dt . HenceGdr(t; s)dt = 2BXX tATh� 2BXX tBTr(t; s)= 2BXX tATh� 2BXX tBTG�1BX2ATh= 2BX tPXATh;dr(t; s)dt = 2G�1BX tPXATh;du(t; s)dt = dq(t; s)dt �X tBTr(t; s)�XBT dr(t; s)dt= X tATh�X tBTG�1BX2ATh� 2XBTG�1BX tPXATh= �I � 2XBTG�1BX�X�1X tPXATh:Thus we obtainu(t; s)T du(t; s)dt = hTAXP �I � 2XBTG�1BX�X�1X tPXATh= hTAXX tPXATh:By the de�nition of u(t; s), x(t; s), X t and Lemma A, we know thatu(t; s) = PXATh;dx(t; s)dt = ~xt(t;w + sh) = 1tXv(t; s);X�1X t = X�1 diag dx(t; s)dt ! = 1t ( diag v(t; s)) :Therefore d3 ~fpds2dt = 2tu(t; s)T du(t; s)dt � u(t; s)Tu(t; s)t2= 2u(t; s)T (diag v(t; s))u(t; s)�u(t; s)Tu(t; s)t2= u(t; s)T (2diag v(t; s)� I)u(t; s)t2 :This completes the proof of Theorem 4.3. 25

Horizontal and Vertical Decomposition in Interior Point Methods for Linear Programs

Documents

Transcript of Horizontal and Vertical Decomposition in Interior Point Methods for Linear Programs