Download - On the updating scheme in a class of collinear scaling algorithms for sparse minimization

Transcript

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 75, No. 1, OCTOBER 1992

TECHNICAL NOTE

On the Updating Scheme in a Class of Collinear Scaling Algorithms for Sparse Minimization 1

K. A. ARIYAWANSA 2 AND D. T. M. LAU 3

Communicated by F. Zirilli

Abstract. Sorensen (Ref. 1) has proposed a class of algorithms for sparse unconstrained minimization where the sparsity pattern of the Cholesky factors of the Hessian is known. His updates at each iteration depend on the choice of a vector, and in Ref. 1 the question of choosing this vector is essentially left open. In this note, we propose a variational problem whose solution may be used to choose this vector. The major part of the computation of a solution to this variational problem is similar to the computation of a trust-region step in unconstrained mini- mization. Therefore, well-developed techniques available for the latter problem can be used to compute this vector and to perform the updating.

Key Words. Quasi-Newton methods, collinear scalings, conic approxi- mations, sparse Hessians.

1. Introduction

Consider the following unconstra ined minimization problem:

(P) given f : 9 t ~ 91,fe C 2, produce a sequence {Xk} c 9 t n that converges to a local minimizer x , o f f .

'This research was supported by NSF Grant DMS-8414460 and by DOE Grant DE-FG06- 85ER25007, awarded to Washington State University, and by the Applied Mathematical Sciences Subprogram of the US Department of Energy under Contract W-3 I-109-Eng-38 while the first author was visiting the Mathematics and Computer Science Division of Argonne National Laboratory.

2Associate Professor, Department of Pure and Applied Mathematics, Washington State Univer- sity, Pullman, Washington.

3Assistant Professor, Department of Mathematical Sciences, Mount Mercy College, Cedar Rapids, Iowa.

183 0022-3239/92/1000-0183506.50/0 © 1992 Plenum Publishing Corporation

184 JOTA: VOL. 75, NO. 1, OCTOBER 1992

When n is large, the Hessianf"(x) of f a t x is usually sparse and its sparsity pattern is fixed for all x. It is also usually the case that the sparsity pattern of f "(x) induces a sparsity pattern on the Cholesky factor C(x) o f f "(x), whenever it exists. To be specific, let

~-J := {(i,j): er~C(x)ej=O; i , j= 1, 2 , . . . , n, Vx~R"}

be the set of indices specifying the fixed sparsity pattern of the Cholesky factors of the Hessian. Define

9Uw: = {M: M~9~ "×", erMej=O, i <j; i , j = 1, 2 . . . . . n}

to be the subspace of ~R ~×~ with lower triangular matrices and

Aa: = {M: M~gU ×~, er~Mej=O, (i,j) ~ - J } _~nvn

to be the subspace of ~" × ~ with the sparsity pattern ~-J. For large problems, it is natural to use algorithms that exploit sparsity, and here we are concerned with algorithms for (P) that exploit sparsity specified by A °.

Quasi-Newton methods for (P) begin with an estimate Xo of x . and an approximation Bo to f"(xo). They then generate {Xk} by

xk + , = Xk + ;Lkdk , (la)

where the search direction dk is the solution to the system

B~lk = -- f ' ( xa), (1 b)

and Xk is the steplength, k = l, 2 . . . . . The current Hessian approximant Bk is then updated to Bk+l so that the quasi-Newton condition,

Bl,+,sk=yk, Sk:=Xk+,--Xk, yk:=f'(Xk+l)--f'(xk), ( lc)

is satisfied. For £~:= 9t "v" (the case we refer to as the dense case), Zhang and

Tewarson (Ref. 2) have proposed a variational problem for updating a factorization Bk=LkL r, Lkegl "v", to Bk+,=Lk+ r ~L~+I. To be specific, let l" I denote a norm on ~,×n; and, forp, qegl "x", let

.~(p, q):= {M: M ~ I "×", Mq=p} ,

.~'(p, q):= {M: M r e ~ ( p , q)}.

The variational problem of Zhang and Tewarson (Ref. 2), specified when s[ yk > O, is

/~k+, := arg!n, in{lL- Lk[: L ~ I " ×", LLr~a(yk, sk)}. (2) L

Lk+j~ 91 "v" is obtained using/Sk+l as indicated below.

JOTA: VOL. 75, NO. 1, OCTOBER 1992 185

They then show that, when l" I is a certain weighted Frobenius norm (see Ref. 2, p. 510), (2) is equivalent to first solving

Oh: = argmin {IL,(~) - Lkl" ~e 91~, ~,r~ = srkyk}, (3a)

where

L,03) :=arg!nin{[L-Lkl: Le91 n×~, £,e.~T(~, sk) c~ -~(Yk, 13)}, (3b) L

and then letting

Z-'k+ 1 := L$(Ok). (3c)

In Ref. 2, it is also shown that (3a) reduces to

0h: = argmin{003): ~e91", ~TO=s~yk}, (4)

where 0: 91"~ 9t is a quadratic function. (4) is a problem related to the problem solved in calculating a trust-region step in unconstrained minimiza- tion (Refs. 3, 4). Of course,/:k+~ e91 "w may not hold for solutions of (2) or equivalently of (3). However, Zhang and Tewarson (Ref. 2) show that /2k+~-Lk has at most rank 2; therefore, it is possible to obtain Lk+~e 91,w, so that r ~ ~r Lk+lLk+l=Lk+lLk+l in O(n 2) operations (Ref. 1, pp. 140, 141).

In an earlier paper, Dennis and Vu (Ref. 5) have essentially stated the generalization of (2) for ~ c 91.w (the case we refer to as the sparse case) as follows (Ref. 5, Problem 1):

Lk+ j:= arggfin{lL-Lkl: £ e ~ , [,f--,'rff.~(yk, Sk)}. (5) L

The norm I" ] that Dennis and Vu (Ref. 5) use in (5) is the Frobenius norm. They then state that "a computationally viable solution to the problem is not obvious" (Ref. 5, p. 3) and discuss obtaining Lk+~ as

Lk+l := argmin{lL-- Lkl: f-,e.~(yk, Oh) n ~ C~ .~r(0k, Sk)} (6)

for a given 0kegP such that OrkOk=S[yk (Ref. 5, Problem 2). Dennis and Vu (Ref. 5, p. 14) state that the choice of a suitable Ok to

be used in (6) is the main obstacle in their approach. It is possible to write down the sparse analogues of Eqs. (3), to discuss their relationship to (5), and to explore the possibility of using them to compute a suitable Oh for use in (6). This of course would be an attempt to generalize the work of Zhang and Tewarson (Ref. 2) to the sparse case.

In a still earlier paper however, Sorensen (Ref. 1, pp. 149, 150, 152, 153) very convincingly argues that updates satisfying sparsity, quasi-Newton condition, and positive definiteness may lead to poor approximation of the

186 JOTA: VOL. 75, NO. 1, OCTOBER 1992

actual Hessian. Specifically, his arguments (Ref. 1, comments following Lemma 5.5) imply that

Lk+t~{£: L~.~(yk, vk) n A a n .~v(~k, &)},

assuming that this set is nonempty, may be such that Lk+ 1L[+t would be an arbitrarily poor approximation of f "(x) for x near Xk when xk+z #xk is arbitrarily close to Xk. Crucial to these arguments is the fact that we choose such L~+leLa attempting to satisfy both the equations Lg+lOk=yk and L L ,sk= Ok.

Sorensen (Ref. 1, Section 6) then demonstrates that collinear scaling algorithms (see also Refs. 6, 7, 8) may be designed to address the above difficulty. Collinear scalings are natural generalizations of affine scalings on which quasi-Newton methods with factored updates are based. The two equations

Zk+lOk=Yk and Lrk+tSk=f~k

imply the quasi-Newton condition (lc), which in turn ensures that the underlying local quadratic approximant to f at Xk+l interpolates f ' at xk+~ and xk. In the context of collinear scaling algorithms, the underlying local approximant at xk+l is a conic function (Ref. 6) which is a natural gen- eralization of a quadratic function. Using the additional degrees of freedom available in this setting, Sorensen (Ref. 1) was able to ensure that the underlying local conic approximant at xk+ 1 interpolates not on lyf ' but also f a t xk+l and xk, with Lk+leA a chosen to satisfy a single equation of the form L~+IG=0k. Here, 0kegt" is to be chosen so that 0krG=2pk, while 6ke91" and pk>0 are available when the update is about to be performed. Since the update L k + l e ~ is chosen to satisfy only the single equation L[+ 10k = Ok, Sorensen (Ref. 1, Section 6) argues that the resulting algorithm may not suffer from the shortcoming of sparse quasi-Newton methods men- tioned in the previous paragraph.

The purpose of this note is to propose a variational problem, similar in spirit to (3), for choosing Ok and for obtaining Lk+ i, in the scheme proposed by Sorensen (Ref. 1, Section 6). It turns out that the major part of the computation of a solution to this variational problem is the solution of a problem similar to (4). As mentioned earlier, (4) is a problem related to the problem solved in calculating a trust-region step in unconstrained minimiza- tion; therefore, by making use of the well-developed techniques for comput- ing trust-region steps (see Refs. 3, 4 for example), it is possible to develop a fast procedure for computing ~k and obtaining Lk +1.

JOTA: VOL. 75, NO. 1, OCTOBER 1992 187

2. Updat ing S c h e m e

We begin with a brief description of the updating scheme at the kth iterate of the colfinear scaling algorithm that Sorensen (Ref. 1, Section 6) proposed. The notation here is the one used by Ariyawansa and Lau (Ref. 9). Suppose that we are at the current iterate Xk and that nonsingular LkeLe approximating C(Xk) and hke~" are available to us. Now, suppose that we take the local coUinear scaling Sk: W~-*xk+LkVW/(1 +hrkw) of the domain off,

x = xk + LkrW/(1 + hVkW), W ~ Nwk(O), (7a)

and then make the local quadratic approximation Vk to tpk :=fo & as defined below:

t&(w) :=f(xk + LJw/ (1 + h Xkw) ) ~ Vk(W)

:= t&(0) + [tp'(0)]rW + (1/2)WrW

=f(xk) + [L-~'f'(xk)]rw + (1/2)wVw, weNwk(O). (7b)

In (7),

W;= {w: we~", 1 +h~w#0}

is the domain of Sk and Nve~(O) is a stfitable neighborhood of 0e Wk. The aim now is to use (7), compute the next iterate Xk+~, and then to update Sk and Vk by updating hk and Lk, respectively. Issues that need to be considered when computing Xk+~ using (7) are discussed by Ariyawansa (Ref. 8, Section 2). Our attention here is focussed on the updating process rather than com- puting xk+ ~. Therefore, suppose that we have computed xk+l #xk. Sorensen (Ref. 1) performs the updating by enforcing

Ck+ff0) = ~ok+ff0),

~¢~+,(o) = ~6+ ~(o),

(8a)

(8b)

(8c)

(8d)

where Vk is such that Sk+~(--t3k)-----Xk. AS shown by Sorensen (Ref. 1), if we computed x~ + j so that

sVkf'(Xk) < O, f(xk) > f(xk + l),

[f(Xk) --f(xk+ ,)l 2 -- SVkf'(xk)sV f'(Xk+ 5) > O,

188 JOTA: VOL. 75, NO. I, OCTOBER 1992

where Sk:=Xk+~--Xk, and if we choose any vk so that

f~rO~ = 2pk := 2[( f (xk) -- f (xk+ l)) 2 -- s~f ' (xk)s~f ' (xk+ l)] ~/2,

then conditions (8) are satisfied by hk+l and Lk+, such that

L ~ + l ~k = Ok ,

?'k 1 - 7"k Lk+ lhk + 1 - g f ' ( x k ) ( r k - Lk + If;a) "1 y~s r f , ( x k ) f ' ( xk ) .

In (9),

(9a)

(9b)

7~k := - - s [ f ' ( x k ) / ( f ( x k ) - - f ( x k + ,) + Pk) > O,

ok:= rksk~0, r,:=f'(xk.,)-(1/rhf'(xk). Note that one way of selecting Lk+l and hk+, to satisfy (9) is to first

select Lk+l So that

L~+l~k=~k and Lk+lOk=rk,

and then to set

hk+,:= [(1 - rk)/(rksr~f'(xk))lL;t+ if'(xk+ 1).

In the class of dense collinear scaling algorithms given by Sorensen (Ref. l, Section 6), precisely this choice is made. However, if we also require that Lk+~ e ~ q°, then we would encounter the difficulties mentioned in Section 1 when attempting to choose Lk+le~ca satisfying both Lr+lOk=Ok and Lk+,0k = rk. Making this observation, Sorensen (Refi 1, Section 6) suggests the following alternative way of choosing Lk+l and hk+l to satisfy (9):

choose any 0kegt" so that O[Ok=2pk>O; (10a)

choose nonsingular Lk+ ~ ~A ° so that/Jr+ ~k = ~k; (10b)

find uk such that Lk+ j uk = l"kf'(xk+ ,) - - f ' (xk) ; (10C)

set ha+, := ( u k - rkOk)/(s~f'(xk)). (10d)

Note that, since (9b) is equivalent to

Lk+ , [(srk f ' ( x k ) )hk + l + )'kVk] = rk f ' ( xk+ ~) - f ' ( x k ) ,

(10c) and (10d) ensure that (9b) is satisfied. The updating scheme (10) would be complete if we specify ~k and

indicate how we select Lk+l e.,q" in (10b). Sorensen (Ref. 1) addresses these two questions only briefly. We now specify a procedure for computing ~k and for obtaining Lk+ i, similar to the procedure (3) of Zhang and Tewarson (Ref. 2) for quasi-Newton methods in the dense case.

JOTA: VOL 75, NO. 1, OCTOBER 1992 189

For convenience, henceforth in this section we drop the subscript k, and in place of the subscript k + 1 use the subscript +. To state our updating procedure more specifically, let nonsingular M~e~lt "x", j = l, 2 , . . . , n, be given; and, for a given 9e9t", let the set 6a(9)___&z be defined by

~(9):= {M: Me~Z', MrO= 9}.

Our updating procedure then is to choose ~ by

~:= argrain t ~ [IMj(L,(9) - L)ej[[ 2: 9e 9t ~, 9 Tg= 2p, ~ (9 )# ~ t , (t la) t3 ! . j= 1 )

where for any 9 such that Sa(9)# ~ ,

L,(9) :=argmin I ~ ,IMAL- L)ejll2: Le6e(9)}, (1 lb) L ~.j= l

and then to let

L+ := L,(~). (1 lc)

In (11) and in the rest of this note, !1" II denotes the Euclidean norm on ~R". Observe that (1 la) and (1 lb) are well defined, since there exist 9e9t", with 9r9=2p and 6e(9)#~. For instance, if

,3:= aLVa, with a := x/2p/(~rLLr~),

then M:= aLe6e(9), since LeLP. Before proceeding, we need to introduce two notational conventions

that we use in the rest of this note. For any given x e 9l" andje { 1, 2 . . . . . n}, we define x°)egt" by

0, if ( i , j )e~J, e~xO) := er~x, otherwise,

for i = l, 2 . . . . , n. For any given aeg / , we define a+e~t by

a+: = 1/a, if a #0 ,

a + := 0, otherwise.

In order to examine solutions to ( t la) , let us define 0: 9 ~ ~ by

0(9) := ~ IIMAL,(O - L)ejH 2.

190 JOTA: VOL. 75, NO. 1, OCTOBER 1992

When 6e( t3)¢~ and qi:=MflMJ~ ~:) has the sparsity pattern of the jth column of LeLa, j = 1, 2 . . . . . n, the solution to (1 lb) is given by Sorensen (Ref. 1, Lemma 5.2) as

L,(e) = L + ~ [q7.~1+4(~- z:o)qj4. j ~ l

(12)

Using (12) and

qj=M;'Mir~ t2), q~=q~.~(J), j= 1, 2 , . . . , n,

we can express 0(~) as follows:

0(,3)--- ~ I IMj(L, (O- L)ejll ~ j=l

= ~ II [qJ~(J)l+e~(t3- LT~)Mj qjll 2 j=l

= ~ [IIMS~<Y) II2]+[e~(~ - LT~)] 2 j = l

(13)

In (13), diagonal De~It ~×~ is defined by

eJDey:= [IIMj-TOcJ)II2] +_0, j = 1, 2 . . . . , n.

Therefore, the problem

O:=argmin{0(t3) : ~3~ 91", 0r~3 = 2p} , (14)

obtained by ignoring the constraint 5 : ( 0 ) ¢ ~ in (l la) , is related to the problem

p := argmin { f+ grw + (1/2)wTBw : WTW < A2}, w

where f, Ae~It, gent", and symmetric Be~t "×', studied by Sorensen (Ref. 4) and others (see references in Ref. 4). Consequently, using the results of Sorensen (Ref. 4), it is possible to write down the form of the solution to problem (14). It turns out that the solution ~ of (14) is such that Se(~) v ~ . Therefore, such ~ solves (1 la) as well. We have the following theorem whose proof is based on the above ideas.

JOTA: VOL. 75, NO. 1, OCTOBER 1992 191

Theorem 2.1. Suppose that L ~ A a and Mj, j = 1, 2 . . . . . n, are nonsin- gular, and that ~ 0 . Let d~i,:=min{dj,j= 1, 2 . . . . . n}, where @=efDej, j= 1, 2 . . . . . n. Then, the unique solution ~ of ( l l a ) is given by

0 = (D + A,,I)-IDLTO,

where A, .e(-d~i . , oo) is the unique root of the equation

with

[ ] h()~) := __~ ~/(dj + X)2 _ 1 = O, y

~'j:= (dje7LVO)2/(2p), j= 1, 2 . . . . . . .

(15a)

(15b)

4= [llMf T00")II21+,

with gj nonsingular, dj = 0 if and only if t~ (j) = 0, j = 1, 2 . . . . . n. It follows that DLT5 = 0 is possible only if

(Le/)T~ (j)= (Lej)T~ = 0, j = 1, 2 . . . . . n,

or equivalently only if LT~ = 0. Since L is nonsingular and ~ # 0, we must therefore conclude that D L T ~ # O.

By Ref. 4, Lemma 2.8, Part (ii), and the statement on uniqueness in that lemma, we note that, if there is a fieg~" and a A.~(-dmi, , m) so that

(D+,~.I)6=DLr~ and ~T0=2p,

then that ~ is the unique solution to (14)~We then observe that

(D+~.I)O=DL'r~ and ~T~=2p

Since

Proof. We write

0(0) = 0TD~- 2(DLTff)T~ + (LTff)TD(Lr~)

and make use of the results of Sorensen (Ref. 4) to first consider solutions of (14). To that end, we first show that DLTO40. If D is nonsingular, then DLrff # 0, since L is nonsingular and ~ # 0. Consider then the ease where D is singular. Now,

DLT~ = ~, 4(Lej)T~)ej = ~ dj(Lej)TO(Y)ej. j = l j=~

192 JOTA: VOL. 75, NO. 1, OCTOBER 1992

imply that ~. must be a root of h(~,) of (15b). Noting that, for ,~e (-d~i. , or), -drain, h(,~) --. - 1 as ~ ~ m, and that h(~) "" m as ~ --* +

h'(z) = - d(4+z)3<0, j=l

since DLrO#0, we see that h of (15b) has a unique root Z,e(-dm~,, o0). We complete the proof by noting that, when ~ is given by (15a),

5a(O) # ~ , since for example, with M:=LD(D+,~.I)-~eLe, Mr~=~. []

We now make the following remarks.

Remark 2.1. In the paragraph immediately following (9b), we men- tioned a way collinear scaling algorithms for the dense case may be obtained by selecting the updates Lg+ ~ and hk+ l to satisfy (9) in a specific manner. in this setting that yields a collinear scaling algorithm related to the BFGS method has the form aL'r~, with

a

see Ref. 8. It is interesting that, if we set ~ : = ~"×" and Mi:=I, j = 1, 2 . . . . . n, then the solution to ( l la ) would be of the form aLTO, with

a = ~/2p/(OVLL'rf~).

However, the resulting update L+ of ( l le) would not be the same as in the above dense collinear sealing algorithm related to the BFGS method.

Remark 2.2. The main part of the computation necessary to form the update in (I tc) is the computation of the root of h in (15b) in ( - d ~ , , oo). Fast algorithms to do so can be developed making use of the techniques in Refs. 3, 4.

3. Conclusions

In this note, we have given a procedure to specify the parameter vector (in our notation, 0) left unspecified in the class of collinear scaling algorithms that Sorensen (Ref. 1) proposed for sparse unconstrained minimization. With ~ specified as in this note, this class of algorithms of Sorensen (Ref. 1) is related to the works of Zhang and Tewarson (Ref. 2) and of Dennis and Vu (Ref. 5). These relations underscore the contribution of this note. The work of Dennis and Vu (Ref. 5) demonstrates the difficulties that one would encounter in attempting to extend the work of Zhang and Tewarson (Ref. 2) to the sparse case within the framework of affine scalings. With t7 specified

JOTA: VOL. 75, NO. 1, OCTOBER 1992 193

as in this note, the sparse collinear scaling algorithm of Sorensen (Ref. 1) may be viewed upon as an extension of the work of Zhang and Tewarson (Ref. 2) from affine scalings to collinear scalings to handle the sparse case. It is also a way to address the difficulties demonstrated by the work of Dennis and Vu (Ref. 5) through the use of additional degrees of freedom that a collinear scaling possesses relative to an affine scaling.

References

I. SORENSEN, D. C., Collinear Scaling and Sequential Estimation in Sparse Optimiza- tion Algorithms, Algorithms and Theory in Filtering and Control, Edited by D. C. Sorensen and R. J. B. Wets, Mathematical Programming Study, Vol. 18, pp. 135-159, 1982.

2. ZHANG, Y., and TEWARSON, R. P., Least-Change Updates to Cholesky Factors Subject to the Nonlinear Quasi-Newton Condition, IMA Journal of Numerical Analysis, Vol. 7, pp. 509-521, 1987.

3. MoR~, J. J., The Levenberg-Marquardt Algorithm: Implementation and Theory, Lecture Notes in Mathematics, Springer-Veflag, Berlin, Germany, Vol. 630, pp. 105-116, 1978.

4. SORENSEN, D. C., Newton's Method with a Model Trust-Region Modification, SIAM Journal on Numerical Analysis, Vol. 19, pp. 409-426, 1982.

5. DENNIS, J. E., JR., and Vu, P., Toward Direct Sparse Updates of Cholesky Factors, Technical Report 83-13, Department of Mathematical Sciences, Rice University, Houston, Texas, 1983.

6. DAVIDON, W. C., Conic Approximations and Collinear Scatings for Optimizers, SIAM Journal on Numerical Analysis, Vot. 17, pp. 268-281, 1980.

7. SORENSEN, D. C., The Q-Superlinear Convergence of a Collinear Scaling Algorithm for Unconstrained Optimization, SIAM Journal on Numerical Analysis, Vol. 19, pp. 409-426, 1980.

8. ARIYAWANSA~ K. A., Deriving Collinear Scaling Algorithms as Extensions of Quasi-Newton Methods and the Local Convergence of DFP- and BFGS-Related Collinear Scaling Algorithms, Mathematical Programming, Vol. 49, pp. 23-48, 1990.

9. ARIVAWANSA, K. A., and LAU, D. T. M., A Coltinear Scaling Algorithm for Sparse Unconstrained Minimization, Proceedings of the 1988 Conference on Infor- mation Sciences and Systems, Edited by P. J. Ramadge and S. Verdfi, Princeton University, Princeton, New Jersey, pp. 488-493, 1988.