A modified two-point stepsize gradient algorithm for unconstrained minimization

13
This article was downloaded by: [Ben Gurion University of the Negev] On: 28 October 2013, At: 00:25 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Optimization Methods and Software Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/goms20 A modified two-point stepsize gradient algorithm for unconstrained minimization Saman Babaie-Kafaki a b & Masoud Fatemi c a Department of Mathematics, Faculty of Mathematics, Statistics and Computer Sciences , Semnan University , PO Box 35195-363, Semnan , Iran b School of Mathematics , Institute for Research in Fundamental Sciences (IPM) , PO Box 19395-5746, Tehran , Iran c Faculty of Mathematical Sciences , Sharif University of Technology , Tehran , Iran Published online: 12 Mar 2012. To cite this article: Saman Babaie-Kafaki & Masoud Fatemi (2013) A modified two-point stepsize gradient algorithm for unconstrained minimization, Optimization Methods and Software, 28:5, 1040-1050, DOI: 10.1080/10556788.2012.667811 To link to this article: http://dx.doi.org/10.1080/10556788.2012.667811 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Transcript of A modified two-point stepsize gradient algorithm for unconstrained minimization

This article was downloaded by: [Ben Gurion University of the Negev]On: 28 October 2013, At: 00:25Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Optimization Methods and SoftwarePublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/goms20

A modified two-point stepsizegradient algorithm for unconstrainedminimizationSaman Babaie-Kafaki a b & Masoud Fatemi ca Department of Mathematics, Faculty of Mathematics, Statisticsand Computer Sciences , Semnan University , PO Box 35195-363,Semnan , Iranb School of Mathematics , Institute for Research in FundamentalSciences (IPM) , PO Box 19395-5746, Tehran , Iranc Faculty of Mathematical Sciences , Sharif University ofTechnology , Tehran , IranPublished online: 12 Mar 2012.

To cite this article: Saman Babaie-Kafaki & Masoud Fatemi (2013) A modified two-point stepsizegradient algorithm for unconstrained minimization, Optimization Methods and Software, 28:5,1040-1050, DOI: 10.1080/10556788.2012.667811

To link to this article: http://dx.doi.org/10.1080/10556788.2012.667811

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013

Optimization Methods & Software, 2013Vol. 28, No. 5, 1040–1050, http://dx.doi.org/10.1080/10556788.2012.667811

A modified two-point stepsize gradient algorithmfor unconstrained minimization

Saman Babaie-Kafakia,b* and Masoud Fatemic

aDepartment of Mathematics, Faculty of Mathematics, Statistics and Computer Sciences, SemnanUniversity, PO Box 35195-363, Semnan, Iran; bSchool of Mathematics, Institute for Research in

Fundamental Sciences (IPM), PO Box 19395-5746, Tehran, Iran; cFaculty of Mathematical Sciences,Sharif University of Technology, Tehran, Iran

(Received 9 September 2010; final version received 15 February 2012)

Based on a modified secant equation proposed by Li and Fukushima, we derive a stepsize for the Barzilai–Borwein gradient method. Then, using the newly proposed stepsize and another effective stepsize proposedby Dai et al. in an adaptive scheme that is based on the objective function convexity, we suggest a modifiedtwo-point stepsize gradient algorithm. We also show that the limit point of the sequence generated byour algorithm is first-order critical. Finally, our numerical comparisons done on a set of unconstrainedoptimization test problems from the CUTEr collection are presented. At first, we compare the performanceof our algorithm with two other two-point stepsize gradient algorithms proposed by Dai et al. and Raydan.Then, numerical comparisons between the implementations of our algorithm and two conjugate gradientmethods proposed by Gilbert and Nocedal, and Hestenes and Stiefel, and also, with the memoryless BFGSalgorithm proposed by Liu and Nocedal, are made. Furthermore, to provide a numerical support for ouradaptive approach, we compare two other two-point stepsize gradient algorithms, one of which applies thestepsize proposed by Dai et al. and the other applies our newly proposed stepsize, with our algorithm whichapplies both of these stepsizes together. Numerical results demonstrating the efficiency of our algorithm,in the sense of the performance profile introduced by Dolan and Moré, are reported.

Keywords: unconstrained optimization; two-point stepsize gradient algorithm; modified secant equation;convexity; nonmonotone line search

Mathematics Subject Classifications (2010): 90C52; 65K05; 49M37; 26B25

1. Introduction

Optimization problems naturally arise from many applications such as advanced engineeringdesign, financial planning, data analysis, signal and image processing, network design, etc.Amongthe continuous optimization problems, the essential problem that attracted special attention is theunconstrained optimization problem, i.e.,

minx∈Rn

f (x), (1)

where f : Rn → R is a smooth function and its gradient is available.

*Corresponding author. Email: [email protected]

© 2013 Taylor & Francis

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013

Optimization Methods & Software 1041

One of the simplest and most fundamental methods for solving (1) is the gradient (or steepestdescent) method proposed by Cauchy [4], an iterative method in which

xk+1 = xk − αkgk , (2)

where gk = ∇f (xk) and αk is the stepsize can be computed by solving the following exact linesearch subproblem:

αk = arg minα>0

f (xk − αgk).

Unfortunately, the gradient method performs poorly, converges linearly and is badly affectedby ill conditioning [1,9]. Therefore, many efforts have been made to overcome these defects (seefor example [2,3,6,18,19]).

Barzilai and Borwein (BB) [2] proposed one of the most efficient modified gradient methods,namely BB gradient method or two-point stepsize gradient method. In the BB gradient method,the stepsize αk is derived from a two-point approximation of the secant equation [20]. As a briefcomment on the BB gradient method, note that the gradient iteration (2) can be written as follows:

xk+1 = xk − Dkgk , (3)

where Dk = αkI . In order to make the matrix Dk satisfy the secant equation, Barzilai and Borweincomputed αk by solving the following optimization problem:

min ‖D−1k sk−1 − yk−1‖,

where sk−1 = xk − xk−1, yk−1 = gk − gk−1, and ‖ · ‖ stands for the Euclidean norm. This yields

αk = sTk−1sk−1

sTk−1yk−1

.

If the number of unknowns amounts to two (n = 2), Barziali and Borwein [2] establishedan R-superlinear convergence property of the BB gradient method for strictly convex quadraticobjective functions. Raydan [18] established the convergence of the BB gradient method whenapplied to the minimization of a strictly convex quadratic objective function of any number ofvariables. For general continuously differentiable objective functions, Raydan [19] incorporateda globalization scheme of the BB gradient method with the technique of nonmonotone line searchproposed by Grippo et al. [12] and introduced the ‘Global Barzilai and Borwein’(GBB) algorithm.The numerical results in [19] showed that the GBB algorithm is competitive and sometimes prefer-able to several famous conjugate gradient algorithms. Dai and Liao [5] established an R-linearconvergence property for strictly convex quadratic objective functions and also a locally R-linearconvergence property for general objective functions for the BB gradient method combined withthe nonmonotone line search procedure. Dai et al. [6] used a quadratic and a cubic interpolationof the objective function f (or equivalently, the non-quasi-Newton updates suggested in [21,22])and proposed two modified GBB algorithms with better numerical performance than an efficientnonmonotone spectral projected gradient method proposed by Birgin et al. [3]. Although the twoalgorithms proposed in [6] are approximately competitive, but the algorithm in which the stepsizeis computed based on the cubic interpolation of the objective function, more exactly, the stepsizeαk is computed by

αk = sTk−1sk−1

6(fk−1 − fk) + 4gTk sk−1 + 2gT

k−1sk−1

def= αk , (4)

is more efficient than the other. The interesting feature of the two modified stepsizes proposedin [6] is their using of function values in addition to the gradient values. More discussion on thelocal behaviour of the BB gradient method is made by Fletcher in [8].

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013

1042 S. Babaie-Kafaki and M. Fatemi

Here, firstly we derive a stepsize for the BB gradient method using a modified secant equationproposed by Li and Fukushima [15]. Then, using the newly proposed stepsize and the stepsize αk

computed by (4) in an adaptive scheme that is based on the convexity of the objective function,we suggest a two-point stepsize gradient algorithm with a promising numerical performance.This work is organized as follows. In Section 2, we propose our modified two-point stepsizegradient algorithm. In Section 3, we compare the numerical performance of our algorithm withtwo efficient two-point stepsize gradient algorithms proposed by Dai et al. [6] and Raydan [19],and two other two-point stepsize gradient algorithms that in one of which the stepsize αk definedby (4) is applied and in the other our newly proposed stepsize is applied, as well as two conjugategradient methods proposed by Gilbert and Nocedal [10], and Hestenes and Stiefel [14], and thememoryless BFGS algorithm proposed by Liu and Nocedal [16]. Finally, we make conclusionsin Section 4.

2. A modified two-point stepsize gradient algorithm

Here, we first briefly discuss the modified secant equation proposed by Li and Fukushima [15]and then, we describe different parts of our algorithm in details.

Quasi-Newton methods are well-known efficient methods for solving optimization problems.These methods approximate the Hessian ∇2f (xk) by an n × n symmetric matrix Bk . A quasi-Newton method is effectively characterized by the way that Bk−1 is updated to obtain a newmatrix Bk in the form of

Bk = Bk−1 + �Bk ,

where �Bk is a correction matrix [20]. The matrix Bk is imposed to satisfy a suitable condi-tion involving the second-order information (to learn more, see [15,21–23]). The most popularcondition is the (standard) secant condition, that is,

Bksk−1 = yk−1.

Many efforts have been devoted to investigate the global convergence properties of the quasi-Newton methods for convex objective functions (see [15] and the references therein). Recently,Li and Fukushima [15] proposed a modified BFGS method, namely MBFGS method, which isglobally and locally superlinearly convergent even without convexity assumption on the objectivefunction (see also [13]). In the MBFGS method the following update formula is suggested:

Bk = Bk−1 − Bk−1sk−1sTk−1Bk−1

sTk−1Bk−1sk−1

+ yk−1yTk−1

sTk−1yk−1

, (5)

with

yk−1 = yk−1 + hk−1‖gk−1‖rsk−1,

where r > 0, and hk−1 > 0 is defined by

hk−1 = C + max

{− sT

k−1yk−1

‖sk−1‖2, 0

}‖gk−1‖−r ,

with some constant C > 0 (see also [24]). Thus, the MBFGS update (5) satisfies the followingmodified secant condition:

Bksk−1 = yk−1. (6)

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013

Optimization Methods & Software 1043

It is remarkable that if sTk−1yk−1 < 0, then we have

sTk−1yk−1 = C‖gk−1‖r‖sk−1‖2 > 0, (7)

otherwise,

sTk−1yk−1 = sT

k−1yk−1 + C‖gk−1‖r‖sk−1‖2 > 0. (8)

Therefore, the MBFGS update (5) preserves the heredity of positive definiteness and consequently,the search directions in the MBFGS method are always descent directions [20].

Here, taking advantage of the global and superlinear convergence properties of the MBFGSmethod not needing convexity assumption on the objective function (and thus having applicabilityto a larger class of objective functions), we make Dk in (3) to satisfy the modified secant Equation(6) when nonconvexity is detected for the objective function during the kth iteration. In thissituation, we compute the stepsize αk by the following formula:

αk = sTk−1sk−1

sTk−1yk−1

def= αk . (9)

This new stepsize has a nice property in the sense that for each k, from (7) and (8), we haveαk > 0.

An important part of our algorithm is the finding of some criteria to investigate the nonconvexityof the objective function f during an iteration. Here, the following two strategies are considered:

(1) For a general nonlinear objective function f , from the mean-value theorem we can write

sTk−1yk−1 = sT

k−1∇2f (xk−1 + tsk−1)sk−1, (10)

for some t ∈ (0, 1). Therefore, when sTk−1yk−1 < 0, from (10) we conclude that in some

neighbourhood N of xk , ∇2f (x) is not positive semidefinite, that is, in this neighbourhood fis nonconvex. So, a criterion is obtained based on the value of sT

k−1yk−1.(2) To introduce another criterion, we need the following theorem.

Theorem 2.1 [20] Let S ⊂ Rn be a nonempty open convex set and f : S ⊂ Rn → R be adifferentiable function. Then f is convex if and only if

f (y) ≥ f (x) + ∇f (x)T (y − x), ∀x, y ∈ S.

Using Theorem 2.1, at the kth iteration if we have

f (xk) < f (xk−1) + ∇f (xk−1)T sk−1,

then we conclude that in some neighbourhood N of xk the objective function f is nonconvex. So,another criterion is obtained based on the value of f (xk) − (f (xk−1) + ∇f (xk−1)

T sk−1).Another important part of a two-point stepsize gradient algorithm is the line search procedure.

In our algorithm, to perform a line search we use the nonmonotone line search condition firstlyproposed by Grippo et al. [12] and then adopted by Raydan [19], Birgin et al. [3] and Daiet al. [5,6]. On this guideline a stepsize α along the descent direction −gk is computed such thatthe following condition is satisfied:

f (xk − αgk) ≤ max0≤j≤min{k−1,M}

{f (xk−j)} − γα‖gk‖2.

Based on the above discussion, we now state our algorithm in details. In the following algorithm,we adopt the suggestions in [6] to set the parameter values.

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013

1044 S. Babaie-Kafaki and M. Fatemi

Algorithm 1 A modified two-point stepsize gradient algorithm (MGBB).

Step 0. {Initialization} Give x1 ∈ Rn, α1 = ‖g1‖−1∞ , M = 10, γ = 10−4, ε = 10−30 and ϑ =10−5; set k = 1.

Step 1. {Stopping criterion} If ||gk|| = 0 then stop.Step 2. {Stepsize computation}

(1) If k = 1 then goto Step 3.(2) Calculate αk and αk by (4) and (9), respectively.(3) If sT

k−1yk−1 < ϑ or fk − (fk−1 + gTk−1sk−1) < ϑ or αk < 0 then

α = max

{ε, min

{1

ε, αk

}},

otherwise

α = max

{ε, min

{1

ε, αk

}}.

Step 3. {Nonmonotone line search} If

f (xk + αgk) < max0≤j≤min{k−1,M}

{f (xk−j)} − γα‖gk‖2,

then αk = α, xk+1 = xk − αkgk , k = k + 1, and goto Step 1.Step 4. Choose σ ∈ [0.1, 0.9], α = σα and goto Step 3.

In the above algorithm, ϑ is a parameter used to detect the nonconvexity of the objectivefunction approximately. To compute αk in Step 2, among the different values of C ∈ {10−k}4

k=0,we set C = 0.1 because of the promising numerical results obtained by this value, and for r, weadopted the suggestion in [24] and set r = 1, if ||gk|| > 1, and r = 3, otherwise. Also, the valueof σ in Step 4 is set to be equal to 0.5.

Since the stepsize in Algorithm 1 is bounded below by ε, similar to the proof of Theorem 2.1in [19] the following global convergence theorem can be proved.

Theorem 2.2 Assume that L = {x : f (x) ≤ f (x0)} is a bounded set. Let f : Rn → R be continu-ously differentiable in some neighbourhood N o of L, and {xk}∞k=0 be the sequence generated byAlgorithm 1. Then either g(xj) = 0 for some finite j or limk→∞ ‖gk‖ = 0.

3. Numerical experiments

In this section, we present our numerical experiments done by applying a MATLAB implemen-tation of our proposed algorithm MGBB comparing with the following seven algorithms:

• GBB: The two-point stepsize gradient algorithm of [19].• DYY: Algorithm 3.2 of [6].• LBFGS: The limited memory BFGS algorithm of [16] with the storage of available

information from m = 5 recent iterations.• PRP+: A conjugate gradient method with the nonnegative update parameter βk =

max{gTk+1yk/‖gk+1‖2, 0} proposed in [10] (see also [17]).

• HS: A conjugate gradient method with the update parameter βk = gTk+1yk/dT

k yk proposedin [14].

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013

Optimization Methods & Software 1045

• M1: Algorithm 1 with the following revised form of Step 2 (Stepsize computation):(1) If k = 1 then goto Step 3.(2) Calculate αk by (4) and set α = max{ε, min{1/ε, αk}}.

• M2: Algorithm 1 with the following revised form of Step 2 (Stepsize computation):(1) If k = 1 then goto Step 3.(2) Calculate αk by (9) and set α = max{ε, min{1/ε, αk}}.

Note that for the conjugate gradient methods of PRP+ and HS we used the strong Wolfe linesearch conditions [20] with the similar parameters as in [10].

Since the two-point stepsize gradient algorithms have mainly designed for large-scale problems,we selected a set of 120 unconstrained optimization test problems from the CUTEr collection [11]with the minimum dimension being equal to 100 as specified in Table 1. Note that no gradientevaluation is required in the line search procedure of MGBB and consequently, the number ofgradient evaluations in MGBB is equal to the number of iterations. Hence, we considered the

Table 1. Specifications of test functions.

Function n Function n Function n

ARGLINA 100 DIXMAANF 9,000 NONDIA 1,000ARGLINA 200 DIXMAANG 1,500 NONDIA 5,000ARWHEAD 100 DIXMAANG 3,000 NONDIA 10,000ARWHEAD 1,000 DIXMAANG 9,000 PENALTY1 100ARWHEAD 5,000 DIXMAANH 1,500 PENALTY1 500BDQRTIC 100 DIXMAANH 3,000 PENALTY1 1,000BDQRTIC 1,000 DIXMAANH 9,000 POWELLSG 1,000BDQRTIC 5,000 DIXMAANJ 1,500 POWELLSG 5,000BROWNAL 100 DIXMAANJ 3,000 POWELLSG 10,000BROWNAL 200 DIXMAANJ 9,000 QUARTC 1,000BROYDN7D 500 DIXMAANL 1,500 QUARTC 5,000BROYDN7D 1,000 DIXMAANL 3,000 QUARTC 10,000BRYBND 1,000 DIXMAANL 9,000 RAYBENDL 130BRYBND 5,000 DQDRTIC 100 RAYBENDL 1,026BRYBND 10,000 DQDRTIC 1,000 RAYBENDL 2,050CHAINWOO 100 DQDRTIC 5,000 SCHMVETT 100CHAINWOO 1,000 DQRTIC 100 SCHMVETT 1,000CHAINWOO 4,000 DQRTIC 1,000 SCHMVETT 5,000COSINE 100 DQRTIC 5,000 SENSORS 100COSINE 1,000 EG2 1,000 SPARSQUR 1,000COSINE 10,000 EDENSCH 2,000 SPARSQUR 5,000CURLY10 10,000 ENGVAL1 100 SPARSQUR 10,000CURLY20 10,000 ENGVAL1 1,000 SPMSRTLS 1,000DIXMAANA 1,500 ENGVAL1 5,000 SPMSRTLS 4,999DIXMAANA 3,000 FLETCBV2 1,000 SROSENBR 1,000DIXMAANA 9,000 FLETCBV2 5,000 SROSENBR 5,000DIXMAANB 1,500 FLETCBV2 10,000 SROSENBR 10,000DIXMAANB 3,000 FMINSRF2 5,625 TOINTGSS 1,000DIXMAANB 9,000 GENROSE 100 TOINTGSS 5,000DIXMAANC 1,500 GENROSE 500 TOINTGSS 10,000DIXMAANC 3,000 LIARWHD 1,000 TQUARTIC 1,000DIXMAANC 9,000 LIARWHD 5,000 TQUARTIC 5,000DIXMAAND 1,500 LIARWHD 10,000 VARDIM 100DIXMAAND 3,000 LMINSURF 121 VARDIM 200DIXMAAND 9,000 LMINSURF 1,024 VAREIGVL 100DIXMAANE 1,500 LMINSURF 5,625 VAREIGVL 500DIXMAANE 3,000 MANCINO 100 VAREIGVL 1,000DIXMAANE 9,000 MOREBV 100 WOODS 1,000DIXMAANF 1,500 MOREBV 1,000 WOODS 4,000DIXMAANF 3,000 MOREBV 5,000 WOODS 10,000

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013

1046 S. Babaie-Kafaki and M. Fatemi

number of function and gradient evaluations to compare the algorithms. Also, all attempts tosolve the test problems were limited to reaching maximum of 10,000 iterations or achieving asolution with

‖gk‖∞ ≤ 10−6(1 + |fk|).Efficiency comparisons were made using the performance profile introduced by Dolan and

Moré [7]. Performance profile gives, for every ω ≥ 1, the proportion p(ω) of the test problemsthat each considered algorithmic variant has a performance within a factor of ω of the best.

The first part of our comparisons was made on the two-point stepsize gradient algorithms ofMGBB, GBB and DYY. The results demonstrated by Figures 1 and 2 show that, in the perspectiveof the number of function and gradient evaluations, MGBB outperforms DYY and also, DYYoutperforms GBB slightly.

Figure 1. Number of function evaluations performance profiles for MGBB, GBB and DYY.

Figure 2. Number of gradient evaluations performance profiles for MGBB, GBB and DYY.

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013

Optimization Methods & Software 1047

Figure 3. Number of function evaluations performance profiles for MGBB, LBFGS, PRP+ and HS.

Figure 4. Number of gradient evaluations performance profiles for MGBB, LBFGS, PRP+ and HS.

The second part of our comparisons was made on the methods of MGBB, LBFGS, PRP+and HS. The results demonstrated by Figures 3 and 4 show that MGBB outperforms HS and iscompetitive with LBFGS in the perspective of the number of function and gradient evaluations.Furthermore, as these figures show, MGBB is competitive with PRP+ in the perspective of thenumber of function evaluations, while in the perspective of the number of gradient evaluationsMGBB outperforms PRP+.

In order to provide a numerical support for our adaptive approach, in the final part of ournumerical comparisons the MGBB algorithm which applies the two stepsizes αk and αk togetherwas compared with the two-point stepsize gradient algorithms M1 and M2 which, respectively,apply the stepsizes αk and αk solely. The results demonstrated by Figures 5 and 6 show that MGBBoutperforms M1 and M2 in the perspective of the number of function and gradient evaluations.Furthermore, Figures 5 and 6 show that although with respect to the number of gradient evaluations

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013

1048 S. Babaie-Kafaki and M. Fatemi

Figure 5. Number of function evaluations performance profiles for MGBB, M1 and M2.

Figure 6. Number of gradient evaluations performance profiles for MGBB, M1 and M2.

M1 and M2 are approximately competitive, M2 outperforms M1 in the perspective of the numberof function evaluations. So, the stepsize αk seems to be more effective than the stepsize αk .

It is remarkable that on the average in %37.29 of the iterations of MGBB αk is accepted asthe stepsize. Thus, the proposed adaptive choice of the stepsize based on the objective functionconvexity turns out to be practically effective.

4. Conclusions

We derived a stepsize for the BB gradient method based on the modified secant equation proposedby Li and Fukushima [15]. Then, we proposed a modified two-point stepsize gradient algorithm inwhich the stepsize is chosen between the new stepsize (9) and the effective stepsize (4) suggested

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013

Optimization Methods & Software 1049

by Dai et al. [6] in an adaptive scheme that is based on the objective function convexity. Wealso showed that the limit point of the sequence generated by our algorithm is first-order critical.Using the performance profile introduced by Dolan and Moré [7], we made numerical comparisonsbetween the implementations of our algorithm (MGBB) and two other two-point stepsize gradientalgorithms proposed by Dai et al. [6] (DYY) and Raydan [19] (GBB), as well as two conjugategradient methods of PRP+ and HS, respectively, proposed by Gilbert and Nocedal [10], andHestenes and Stiefel [14], and the LBFGS algorithm proposed by Liu and Nocedal [16], on a setof unconstrained optimization test problems from the CUTEr collection. Furthermore, in orderto provide a numerical support for our adaptive approach, the two two-point stepsize gradientalgorithms M1 and M2 which, respectively, apply the stepsizes (4) and (9) have been comparedwith our algorithm which applies both of these stepsizes together. Comparison results showedthat MGBB outperforms the methods of GBB, DYY, M1, M2 and HS, and is competitive withthe efficient methods of LBFGS and PRP+. Thus, our proposed adaptive choice of the stepsizebased on the objective function convexity turns out to be practically effective.

Acknowledgements

This research was in part supported by a grant from IPM (No. 90900023), and in part by the Research Councils of SemnanUniversity and Sharif University of Technology. The authors are grateful to the anonymous reviewer for his valuablecomments and suggestions leading to an improvement in the quality of this work, Professor Michael Navon for providingthe strong Wolfe line search code, and Professor Yuhong Dai for his helpful hints.

References

[1] H. Akaike, On a successive transformation of probability distribution and its application to the analysis of theoptimum gradient method, Ann. Inst. Statist. Math. Tokyo 11 (1959), pp. 1–17.

[2] J. Barzilai and J.M. Borwein, Two-point stepsize gradient methods, IMA J. Numer. Anal. 8(1) (1988), pp. 141–148.[3] E.G. Birgin, J.M. Martínez, and M. Raydan, Nonmonotone spectral projected gradient methods for convex sets,

SIAM J. Optim. 10(4) (2000), pp. 1196–1211.[4] A. Cauchy, Méthodes générales pour la résolution des systèmes déquations simultanées, C.R. Acad. Sci. Par. 25

(1847), pp. 536–538.[5] Y.H. Dai and L.Z. Liao, R-linear convergence of the Barzilai and Borwein gradient method, IMA J. Numer. Anal.

22 (2002), pp. 1–10.[6] Y.H. Dai, Y.J. Yuan, and Y. Yuan, Modified two-point stepsize gradient methods for unconstrained optimization,

Comput. Optim. Appl. 22(1) (2002), pp. 103–109.[7] E.D. Dolan and J.J. Moré, Benchmarking optimization software with performance profiles, Math. Program. 91 (2002),

pp. 201–213.[8] R. Fletcher, On the Barzilai–Borwein method, in Optimization and Control with Applications, L. Qi, K.L. Teo, and

X.Q. Yang, eds., Springer, Berlin, 2005, pp. 235–256.[9] G.E. Forsythe, On the asymptotic directions of the s-dimensional optimum gradient method, Numer. Math. 11 (1968),

pp. 57–76.[10] J.C. Gilbert and J. Nocedal, Global convergence properties of conjugate gradient methods for optimization, SIAM

J. Optim. 2(1) (1992), pp. 21–42.[11] N.I.M. Gould, D. Orban, and Ph.L. Toint, CUTEr, a constrained and unconstrained testing environment, revisited,

ACM Trans. Math. Softw. 29 (2003), pp. 373–394.[12] L. Grippo, F. Lampariello, and S. Lucidi, A nonmonotone line search technique for Newton’s methods, SIAM J.

Numer. Anal. 23 (1986), pp. 707–716.[13] Q. Guo, J.G. Liu, and D.H. Wang, A modified BFGS method and its superlinear convergence in nonconvex

minimization with general line search rule, J. Appl. Math. Comput. 28 (2008), pp. 435–446.[14] M.R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Standards

48 (1952), pp. 409–436.[15] D.H. Li and M. Fukushima, A modified BFGS method and its global convergence in nonconvex minimization, J.

Comput. Appl. Math. 129 (2001), pp. 15–35.[16] D.C. Liu and J. Nocedal, On the limited memory BFGS method for large-scale optimization methods, Math. Program.

45 (1989), pp. 503–528.[17] M.J.D. Powell, Nonconvex minimization calculations and the conjugate gradient method, in Numerical Analysis

(Dundee, 1983), D.F. Griffiths, ed., Lecture Notes in Mathematics, vol. 1066, Springer, Berlin, 1984, pp. 122–141.[18] M. Raydan, On the Barzilai and Borwien choice of steplength for the gradient method, IMA J. Numer. Anal. 13

(1993), pp. 321–326.

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013

1050 S. Babaie-Kafaki and M. Fatemi

[19] M. Raydan, The Barzilai and Borwien gradient method for the large-scale unconstrained minimization problem,SIAM J. Optim. 7(1) (1997), pp. 26–33.

[20] W. Sun and Y. Yuan, Optimization Theory and Methods: Nonlinear Programming, Springer, New York, 2006.[21] Y. Yuan, A modified BFGS algorithm for unconstrained optimization, IMA J. Numer. Anal. 11 (1991), pp. 325–332.[22] Y. Yuan and R.H. Byrd, Non-quasi-Newton updates for unconstrained optimization, J. Comput. Math. 13(2) (1995),

pp. 95–107.[23] J.Z. Zhang, N.Y. Deng, and L.H. Chen, New quasi-Newton equation and related methods for unconstrained

optimization, J. Optim. Theory Appl. 102 (1999), pp. 147–167.[24] W. Zhou and L. Zhang, A nonlinear conjugate gradient method based on the MBFGS secant condition, Optim.

Methods Softw. 21(5) (2006), pp. 707–714.

Dow

nloa

ded

by [

Ben

Gur

ion

Uni

vers

ity o

f th

e N

egev

] at

00:

25 2

8 O

ctob

er 2

013