The Convergence Properties of a New Kind of Conjugate Gradient Method for Unconstrained Optimization

12
Applied Mathematical Sciences, Vol. 9, 2015, no. 38, 1845 - 1856 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2015.411997 The Convergence Properties of a New Kind of Conjugate Gradient Method for Unconstrained Optimization Rabi’u Bashir Yunus 1* , Mustafa Mamat 1 , Abdelrahman Abashar 1, 2 , Mohd Rivaie 3 , Zabidin Salleh 4 , and Zahrahtul Amani Zakaria 1 1 Faculty of Informatics and Computing Universiti Sultan Zainal Abidin (UniSZA), Kuala Terengganu, Malaysia 2 Faculty of Engineering, Red Sea University, Sudan Department of Computer Sciences and Mathematics 3 Universiti Teknologi MARA (UiTM) Terengganu, Campus Kuala Terengganu, Malaysia School of Informatics and Applied Mathematics 4 i Malaysia Terengganu, Malaysia Universit Copyright © 2014 Rabi’u Bashir Yunus et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Conjugate gradient (CG) methods are the most prominent technique for solving large-scale unconstrained optimization problems, due to its robustness, low memory requirement, and global convergence properties. Numerous studies and modifications have been carried out recently to improve these methods. In this paper, a new modification of a CG coefficient that possesses the global convergence properties is presented. The global convergence result is validated using exact line search. Several numerical experiments showed that, the proposed formula is found to be robust and efficient when compared to other CG coefficients. Keywords: Conjugate gradient coefficient; exact line search; global convergence

Transcript of The Convergence Properties of a New Kind of Conjugate Gradient Method for Unconstrained Optimization

Applied Mathematical Sciences, Vol. 9, 2015, no. 38, 1845 - 1856

HIKARI Ltd, www.m-hikari.com

http://dx.doi.org/10.12988/ams.2015.411997

The Convergence Properties of a New Kind of

Conjugate Gradient Method for

Unconstrained Optimization

Rabi’u Bashir Yunus1*, Mustafa Mamat1, Abdelrahman Abashar1, 2,

Mohd Rivaie3, Zabidin Salleh4, and Zahrahtul Amani Zakaria1

1Faculty of Informatics and Computing

Universiti Sultan Zainal Abidin (UniSZA), Kuala Terengganu, Malaysia

2Faculty of Engineering, Red Sea University, Sudan

Department of Computer Sciences and Mathematics3

Universiti Teknologi MARA (UiTM) Terengganu, Campus Kuala Terengganu, Malaysia

School of Informatics and Applied Mathematics4

i Malaysia Terengganu, MalaysiaUniversit

Copyright © 2014 Rabi’u Bashir Yunus et al. This is an open access article distributed under the

Creative Commons Attribution License, which permits unrestricted use, distribution, and

reproduction in any medium, provided the original work is properly cited.

Abstract

Conjugate gradient (CG) methods are the most prominent technique for solving

large-scale unconstrained optimization problems, due to its robustness, low

memory requirement, and global convergence properties. Numerous studies and

modifications have been carried out recently to improve these methods. In this

paper, a new modification of a CG coefficient that possesses the global

convergence properties is presented. The global convergence result is validated

using exact line search. Several numerical experiments showed that, the proposed

formula is found to be robust and efficient when compared to other CG

coefficients.

Keywords: Conjugate gradient coefficient; exact line search; global convergence

1846 Rabi’u Bashir Yunus et al.

1. Introduction

The nonlinear conjugate gradient method is modeled to solve the following

unconstrained optimization problem:

,),(min nRxxf (1)

where RRf n : is a continuously differentiable function. The iterative formula

for solving CG method is expressed as

,...,2,1,0,1 kdxx kkkk (2)

where kx is a current iterate point, k is a stepsize which is obtained by carrying

out a one dimensional search, known as the line search. The most common is the

exact line search, given by

).(minarg0

kkk dxf

(3)

The kd is the search direction defined by

,1if,

,0if,

1 kdg

kgd

kkk

k

k

(4)

where k is a parameter known as conjugate gradient coefficient. Some well-

known classical formulas for k are the Fletcher-Reeves (FR) [17], Polak-

Ribière-Polyak (PRP) [2], Hestenes-Stiefel (HS) [11], Liu-Storey (LS) [20], Dai-

Yuan (DY) [21], and conjugate descent (CD) [16] are given below

2

1

2

||||

||||

k

kFR

kg

g

(5)

2

1

1

||||

)(

k

kk

T

kPRP

kg

ggg

(6)

)(

)(

11

1

kk

T

k

kk

T

kHS

kggd

ggg

(7)

11

1)(

k

T

k

kk

T

kLS

kgd

ggg

(8)

)(

||||

11

2

kk

T

k

kDY

kggd

g

(9)

11

2||||

k

T

k

kCD

kgd

g

(10)

The convergence properties of a new kind 1847

where kg and 1kg are the gradient of )(xf at the point kx and 1kx respectively,

and ||.|| represents the Euclidean norm of vectors.

Many researchers have studied the global convergences of the above methods.

Zoutendijk [5] was the first to proof the global convergence of the FR method

with exact line search. But later Powell [10] disapproved this and pointed out that

the FR method has shown poor performance in practicality due jamming

phenomenon. It was believed that PRP method is the most reliable CG method,

but its convergence for general nonlinear function is uncertain [19]. However,

Powell [9] showed that with an exact line search, PRP method might not

converges and could cycle infinitely. For a sample research conducted to study the

properties of CG methods, please refer to Al-Baali [8], Wei et al. [22], Touti-

Ahmed and Storey [3], Gilbert and Nocedal [6], Hager and Zhang [18], Abashar

et al. [1] and Rivaie et al. [12,13,14].

In this paper, we present our new k and compared its performance with the

classical formulas FR, PRP and RAMI method (see Rivaie et al. [13]). The

remaining sections of the paper will be organised as follows: In Section 2, a new

CG as well as a general algorithm for CG methods will be presented. In Section 3,

we will show the sufficient descent and global convergence proofs of our new

method. Numerical results and discussions will be presented in Section 4. Finally,

our conclusion based on these comparisons will be drawn in Section 5.

2. New proposed method

Recently, Rivaie et al. [13] proposed a new nonlinear conjugate gradient formula

defined by

)( 1

1

11

kk

T

k

k

k

k

k

T

k

RAMI

kgdd

gg

ggg

(11)

Motivated by the above, we propose our new k known as RMAR

k , where RMAR

denotes Rabi’u, Mustafa, Abdelrhaman and Rivaie. The RMAR

k is defined as

2

1

1

1

k

k

k

k

k

T

k

RMAR

k

d

dd

ggg

(12)

The algorithm is given as follows:

Step1: Initialization. Given nRx 0 , set 0k .

1848 Rabi’u Bashir Yunus et al.

Step2: Compute k based on (6), (7) and (12).

Step3: Compute kd based on (4). If 0kg , then stop.

Step4: Compute k based on exact line search (3).

Step5: Updating new point based on iterative formula (2).

Step6: Convergent test and stopping criteria.

If )()( 1 kk xfxf and kg then stop.

Otherwise go to Step 2 with .1 kk

3. Convergent analysis

In this section, the convergent properties of RMAR

k will be studied. We begin with

the sufficient descent condition.

3.1 Sufficient descent condition

For sufficient condition to hold, 2

kk

T

k gCdg for all 0k , and 0C .

(13)

The following theorem shows that our new formula with exact line search satisfies

the sufficient descent condition.

Theorem 1

Consider a CG method with search direction (4) and RMAR

k given as (12), then

condition (13) holds for all k ≥ 0.

Proof.

The proof is by induction. If 0k , then 2

000 gCdgT . Hence, condition (13)

holds true.

We also need to show that for all 0k , condition (13), will also hold true. From

(4) multiply both sides by T

kg 1 then,

k

T

kkkkk

T

k

T

kk

T

k dggdggdg 11

2

111111

(14)

For exact line search, we know that 01 k

T

k dg . Thus,

2

111 kk

T

k gdg

Hence, this condition holds true for 1k . The proof is completed. ■

The convergence properties of a new kind 1849

3.2 Global convergence properties

To study the global convergence properties, we first need to show that RMAR

k is

always nonnegative

02

1

12

1

2

1

11

k

kk

k

k

k

k

k

k

k

k

T

k

RMAR

k

d

dgd

gg

d

dd

ggg

(15)

Secondly, there is need to simplify RMAR

k

2

2

1

2

1

12

1

2

1

12

12

k

k

k

kk

k

k

k

k

k

T

k

k

k

k

d

g

d

dgd

gg

d

dgd

gg

(16)

Therefore, 2

2

120

k

kRMAR

k

d

g .

The following assumptions are required for global convergence analysis of CG

methods.

Assumption 1

(i) The level set )()(| 0xfxfRx n is bounded, where 0x is the

starting point and f is continuously differentiable function in a neighborhood N

of the level set .

(ii) The gradient )(xg is Lipschitz continuous in ,N there exists a constant

,0L such that yxLygxg )()( for any ., Nyx

Under this assumption we have the following important theorems.

Lemma 1

Suppose Assumptions 1 holds true, consider any CG methods of the form (2) and

(4), k is obtained by exact line search (3). Then, the following condition, known

as the Zoutendijk condition (see Zoutendijk [5]), holds

02

2)(

k k

k

T

k

d

dg

(17)

The proof of the above lemma can be found in [5]. The following convergent

theorem of the CG method is based on Lemma 1.

1850 Rabi’u Bashir Yunus et al.

Theorem 2

Suppose that Assumptions 1 hold, consider any CG methods of the form (2) and

(4), k is obtained by exact line search (3) and k is determined by (12). Then

0||||lim

kk

g (18)

Proof.

Rewriting (4) as

k

RMAR

kkk dgd 111 ,

and squaring both sides of the equation, we obtain

2

111

22

1

2

1 2 kk

T

kk

RMAR

kk gdgdd (19)

Substituting (16) into (19), we have

2

111

2

2

2

2

12

1 22

kk

T

kk

k

k

k gdgdd

gd

We have already proven that sufficient descent condition holds. Therefore, we

know that

2

111 kk

T

k gCdg

Hence, from (20),

2

1

2

12

4

12

1 24

kk

k

k

k ggcd

gd

cgd

gd k

k

k

k 214 2

12

4

12

1

(21)

Multiplying both sides of (21) with 2

1

2

1

k

k

d

g. Then,

cgd

g

d

g

d

gd k

k

k

k

k

k

k

k 214 2

12

4

1

2

1

2

1

2

1

2

12

1

2

1112

4

12

1 24

kk

T

k

k

k

k gdgd

gd

(20)

The convergence properties of a new kind 1851

2

2

1

2

1

4

1

2

1

2

1

2

1 412

k

k

k

k

k

kk

d

gc

d

g

d

gd

2

1

4

1

2

1

2

1

2

1

k

k

k

kk

d

g

d

gd

(22)

Based on Lemma 1 we know that

0lim2

1

2

11

k

k

T

k

k d

dg.

This will imply that if Theorem 2 is not true, then we have

2

1

2

11lim

k

k

T

k

k d

dg

and from

(22) we get2

1

4

1

k

k

d

g. Hence, Theorem 3 holds for sufficiently large .k ■

4. Numerical results

In this section, we carried out some numerical experiments to test Algorithm 2.1,

we utilize some of the test problems considered in Andrei [15] as shown in table

1, to analyse the efficiency of our new formula as compared with the FR, PRP,

and RAMI method. The comparisons are based on the number of iterations and

CPU time in seconds. The step size k is obtained using exact line search. We

considered 610 and kg as stopping criteria. All problems listed in table 1

are solved by MATLAB version 7.6.0 (R 2008a) subroutine programming. The

CPU processor used was Intel® Celeron (1.4 GHz, 2MB L3 Cache), with 4GB

DDR3 RAM. The performance results are shown in Figures 1 and 2, respectively,

based on the performance profile introduced by Dolan and More [4].

The idea behind this performance profile is to evaluate and compare the

performance of the set solvers S on a test P. Supposing that 𝑛𝑠 solvers and 𝑝𝑠 problem exists, for each problem p and solver s, they defined

tp,s= computing time required to solve problems p by solver s (the number of

iteration or CPU time).

1852 Rabi’u Bashir Yunus et al.

Table 1: A list of problem functions

No Function Dimension Initial Points

1 Three Hump Camel 2 (-2,2), (2,-2), (11,11), (15,15)

2 Six Hump Camel 2 (8,8), (-8,-8), (10,10), (-10,-10)

3 Booth 2 (10,10), (25,25), (50,50), (100,100)

4 Treccani 2 (5,5), (15,15), (50,50), (100,100)

5 Zettl 2 (5,5), (10,10), (20,20), (50,50)

6 Fletcher 2,4,10 (3,3,…,), (5,5,…,5), (7,7,…,7), (11,11,…,11)

7 Sphere 2,4,10 (5,5,…,5), (7,7,…,7),

(10,10,…,10), (100,100,…100)

8 Tridiagonal 2 2,4,10 (5,5,…,5), (10,10,…,10),

(15,15,…,15), (20,20,…,20)

9 Tridiagonal 1 2,4,100 (18,18,…,18), (19,19,…,19),

(22,22,…,22), (30,30,…,30)

10 Extended Penalty 2,4,10,100 (101,101,…,101), (109,109,…,109),

(141,141,…,141), (200,200,…,200)

11 Raydan 1 2,4,10,100 (1,1,…,1), (3,3,…,3), (-5,-5,…,-5), (-10,-10,…,-10)

12 Extended Maratos 2,4,10,100 (15,15,…,15), (16,16,…,16),

(25,25,…,25), (55,55,…,55)

13 Hager 2,4,10,100 (3,3,…,3),(10,10,…,10),(21,21,…,21),(23,23,…,23)

14 Quadratic QP2 2,4,10,100,500 (10,10,…,10), (20,20,…,20),

(35,35,…,35), (100,100,…,100)

15 Frudenstein and Roth 2,4,10,100,500,

1000

(3,3,…,3), (5,5,…,5), (7,7,…,7), (12,12,…,12)

16 Shalow 2,4,10,100,500,

1000,10000

(10,10,…,10), (25,25,…,25),

(50,50,…,50), (100,100,…,100)

17 Ex-Tridiagonal 1 2,4,10,100,500,

1000,10000

(3,3,…,3), (7,7,…,7), (9,9,…,9), (20,20,…,20)

18 White and Holst 2,4,10,100,500,

1000,10000

(3,3,…,3), (6,6,…,6), (9,9,…,9), (12,12,…,12)

19 Rosenbrock 2,4,10,100,500,

1000,10000

(13,13,…,13), (16,16,…,16),

(20,20,…,20), (30,30,…,30)

20 Diagonal 4 2,4,10,100,500,

1000,10000

(2,2,…,2), (5,5,…,5), (10,10,…,10), (15,15,…,15)

21 Denschnb 2,4,10,100,500,

1000,10000

(3,3,…,3), (8,8,…,8), (9,9,…,9), (25,25,…,25)

22 Extended Beale 2,4,10,100,500,

1000,10000

(1,1,…,1), (3,3,…,3), (7,7,…,7), (10,10,…,10)

23 Himmelblau 2,4,10,100,500,

1000,10000

(25,25,…,25), (55,55,…,55),

(101,101,…,101), (199,199,…,199)

24 Quartic 2,4,10,100,500,

1000,10000

(1,1,…,1), (3,3,…,3), (30,30,…,30),

(100,100,…,100)

The convergence properties of a new kind 1853

Figure 1: Performance profile based on number of iterations.

Figure 2: Performance profile based on the CPU time.

There is need for baseline for comparisons; we examine the performance on

problem p by solver s with the best performance by any solver on this problem,

that we utilize the performance ratio

Sst

tr

sp

sp

sp

:min ,

,

,

t

e0 e1 e2 e3 e4

Ps(t

)

0.0

0.2

0.4

0.6

0.8

1.0

PRP RMARRAMI

FR

t

e0 e1 e2 e3 e4

Ps(t

)

0.0

0.2

0.4

0.6

0.8

1.0

FR

RMARPRP RAMI

1854 Rabi’u Bashir Yunus et al.

Assume that a parameter spM rr , for all sp, is chosen, and

spM rr , if and

only if solver s does not solve problem p. The performance of solver s on any

given problem might be of interest, but we would like to obtain an overall

assessment of the performance of the solver, then it was defined

trPpsizen

t sp

p

s ,:1

)(

Thus )(tps was the probability for solver Ss that a performance ratio spr , was

within a factor Rt of the best possible ration. Then, function sp was the

cumulative distribution function for the performance ratio. The performance

profile ]1,0[: Rps for a solver was a non-decreasing, piecewise, and continuous

from the right. The value of )1(sp is the probability that the solver will win over

the rest of the solvers. In general, a solver with high values of )(p or at the top

right of the figure are preferable or represent the best solver.

Figures 1 and 2 illustrate the performance of these methods based on number of

iteration and CPU time respectively. We can easily perceive that our proposed

method is better when compared with FR method which solves 83% of the test

problems; RAMI method solves 89% of the test problems. Although PRP method

seems to be faster than our new method but it can only solves 95% of the test

problem. Therefore, we rank our new formula RMAR as the best, since it can

solves all of the test problems and achieve 100%.

5. Conclusion

In this paper, we present a new and simple k that satisfies the sufficient descent

conditions and converges globally. Numerical results clearly have shown that the

proposed k performs better than FR, PRP and RAMI method. In the future, we

intend to boost the performance of this new RMAR

k by employing the inexact line

search.

Acknowledgments. The authors would like to thank the government of Malaysia

for the funding of this research under the Fundamental Research Grant Scheme

(Grant no. 59256), UniSZA/14/GU (018) and also the government of Kano state,

Nigeria.

References

[1] A. Abashar, M. Mamat, M. Rivaie and I. Mohd. Global convergence

properties of a new class of conjugate gradient method for unconstrained

optimization, Appl. Math. Sci.8 (2014), 3307-3319.

http://dx.doi.org/10.12988/ams.2014.43246

The convergence properties of a new kind 1855

[2] B. T. Polyak, The conjugate gradient method in extremal problems, USSR

Comp. Math. Phys. 9 (1969), 94-112.

http://dx.doi.org/10.1016/0041-5553(69)90035-4

[3] D. Touati-Ahmed, C. Storey, Globally convergent hybrid conjugate gradient

methods, J. Optim. Theory Appl. 64 (1990), 379-397.

[4] E. Dolan, J.J More, Benchmarking optimization software with performance

profile, Math.Prog. 91 (2002), 201-213. http://dx.doi.org/10.1007/s101070100263

[5] G. Zoutendijk, Nonlinear programming computational methods, in: J. Abadie,

(Ed.), Integer and Nonlinear Programming, North-Holland, Amsterdam, 1970, 37-

86.

[6] J. C. Gilbert, J. Nocedal, Global convergence properties of conjugate gradient

methods for optimization, SIAM J. Optim. 2 (1992), 21-42.

http://dx.doi.org/10.1137/0802003

[7] L. Zhang, An improved Wei–Yao–Liu nonlinear conjugate gradient method

for optimization computation, Appl. Math. Comput. 215(2009), 2269–2274.

http://dx.doi.org/10.1016/j.amc.2009.08.016

[8] M. Al-Baali, Descent property and global convergence of the Fletcher-Reeves

method with inexact line search, IMA J. Numer. Anal. 5 (1985), 121-124.

http://dx.doi.org/10.1093/imanum/5.1.121

[9] M.J.D. Powell, Restart procedures for the conjugate gradient method, Math.

Program. 12(1977), 241-254. http://dx.doi.org/10.1007/bf01593790

[10] M.J.D. Powell, Nonconvex minimization calculations and the conjugate

gradient method in Lecture notes in mathematics, 1066, Springer-Verlag, Berlin,

(1984), 122–141. http://dx.doi.org/10.1007/bfb0099521

[11] M. R. Hestenes and E. L. Stiefel, Methods of conjugate gradients for solving

linear systems, J. Research Nat. Bur. Standards, 49 (1952), 409-436.

http://dx.doi.org/10.6028/jres.049.044

[12] M. Rivaie, M. Mamat, W.J. Leong, and M. Ismail, A new class of nonlinear

conjugate gradient coefficients with global convergence properties, Appl. Math.

Comput. 218 (2012), 11323-11332.

http://dx.doi.org/10.1016/j.amc.2012.05.030

[13] M. Rivaie, A. Abashar, M. Mamat and I. Mohd, The convergence properties

of a new type of conjugate gradient methods, Applied Mathematical Sciences, 8

(2014), 33-44. http://dx.doi.org/10.12988/ams.2014.310578

1856 Rabi’u Bashir Yunus et al.

[14] M. Rivaie, M. Mamat, M. Ismail and M. Fauzi, A comparative study of

conjugate gradient coefficient for unconstrained optimization, Aus. J. Bas. Appl.

Sci. 5 (2011), 947-951.

[15] N. Andrei, An unconstrained optimization test functions collection, Adv.

Modell. Optim. 10 (2008), 147-161.

[16] R. Fletcher, C. Reeves, Function minimization by conjugate gradients,

Comput. J. 7 (1964), 149-154. http://dx.doi.org/10.1093/comjnl/7.2.149

[17] R. Fletcher, Practical Method of Optimization, Vol. 1, Unconstrained

Optimization, Vol.I, Wiley, New York, 1987.

[18] W.W Hager, and H.C. Zhang, A new conjugate gradient method with

guaranteed descent and efficient line search, SIAM J. Optim. 16 (2005), 170-192.

http://dx.doi.org/10.1137/030601880

[19] W. W. Hager and H.C. Zhang, A survey of nonlinear conjugate gradient

methods, Pacific Journal of Optimization 2(1), (2006), 335-58.

[20] Y. Liu, and C. Storey, Efficient generalized conjugate gradient algorithms.

Part 1: Theory, J. Optim. Theory Appl. 69 (1991), 129–137.

http://dx.doi.org/10.1007/bf00940464

[21] Y.H. Dai, and Y. Yuan, A nonlinear conjugate gradient method with a strong

global convergence property, SIAM J. Optim. 10 (2000), 177–182.

http://dx.doi.org/10.1137/s1052623497318992

[22] Z. Wei, S. Yao and L. Liu, The convergence properties of some new

conjugate gradient methods, Appl. Math. Comput. 183 (2006), 1341-1350.

http://dx.doi.org/10.1016/j.amc.2006.05.150

Received: December 10, 2014; Published: March 9, 2015