The Convergence Properties of a New Kind of Conjugate Gradient Method for Unconstrained Optimization
Transcript of The Convergence Properties of a New Kind of Conjugate Gradient Method for Unconstrained Optimization
Applied Mathematical Sciences, Vol. 9, 2015, no. 38, 1845 - 1856
HIKARI Ltd, www.m-hikari.com
http://dx.doi.org/10.12988/ams.2015.411997
The Convergence Properties of a New Kind of
Conjugate Gradient Method for
Unconstrained Optimization
Rabi’u Bashir Yunus1*, Mustafa Mamat1, Abdelrahman Abashar1, 2,
Mohd Rivaie3, Zabidin Salleh4, and Zahrahtul Amani Zakaria1
1Faculty of Informatics and Computing
Universiti Sultan Zainal Abidin (UniSZA), Kuala Terengganu, Malaysia
2Faculty of Engineering, Red Sea University, Sudan
Department of Computer Sciences and Mathematics3
Universiti Teknologi MARA (UiTM) Terengganu, Campus Kuala Terengganu, Malaysia
School of Informatics and Applied Mathematics4
i Malaysia Terengganu, MalaysiaUniversit
Copyright © 2014 Rabi’u Bashir Yunus et al. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Abstract
Conjugate gradient (CG) methods are the most prominent technique for solving
large-scale unconstrained optimization problems, due to its robustness, low
memory requirement, and global convergence properties. Numerous studies and
modifications have been carried out recently to improve these methods. In this
paper, a new modification of a CG coefficient that possesses the global
convergence properties is presented. The global convergence result is validated
using exact line search. Several numerical experiments showed that, the proposed
formula is found to be robust and efficient when compared to other CG
coefficients.
Keywords: Conjugate gradient coefficient; exact line search; global convergence
1846 Rabi’u Bashir Yunus et al.
1. Introduction
The nonlinear conjugate gradient method is modeled to solve the following
unconstrained optimization problem:
,),(min nRxxf (1)
where RRf n : is a continuously differentiable function. The iterative formula
for solving CG method is expressed as
,...,2,1,0,1 kdxx kkkk (2)
where kx is a current iterate point, k is a stepsize which is obtained by carrying
out a one dimensional search, known as the line search. The most common is the
exact line search, given by
).(minarg0
kkk dxf
(3)
The kd is the search direction defined by
,1if,
,0if,
1 kdg
kgd
kkk
k
k
(4)
where k is a parameter known as conjugate gradient coefficient. Some well-
known classical formulas for k are the Fletcher-Reeves (FR) [17], Polak-
Ribière-Polyak (PRP) [2], Hestenes-Stiefel (HS) [11], Liu-Storey (LS) [20], Dai-
Yuan (DY) [21], and conjugate descent (CD) [16] are given below
2
1
2
||||
||||
k
kFR
kg
g
(5)
2
1
1
||||
)(
k
kk
T
kPRP
kg
ggg
(6)
)(
)(
11
1
kk
T
k
kk
T
kHS
kggd
ggg
(7)
11
1)(
k
T
k
kk
T
kLS
kgd
ggg
(8)
)(
||||
11
2
kk
T
k
kDY
kggd
g
(9)
11
2||||
k
T
k
kCD
kgd
g
(10)
The convergence properties of a new kind 1847
where kg and 1kg are the gradient of )(xf at the point kx and 1kx respectively,
and ||.|| represents the Euclidean norm of vectors.
Many researchers have studied the global convergences of the above methods.
Zoutendijk [5] was the first to proof the global convergence of the FR method
with exact line search. But later Powell [10] disapproved this and pointed out that
the FR method has shown poor performance in practicality due jamming
phenomenon. It was believed that PRP method is the most reliable CG method,
but its convergence for general nonlinear function is uncertain [19]. However,
Powell [9] showed that with an exact line search, PRP method might not
converges and could cycle infinitely. For a sample research conducted to study the
properties of CG methods, please refer to Al-Baali [8], Wei et al. [22], Touti-
Ahmed and Storey [3], Gilbert and Nocedal [6], Hager and Zhang [18], Abashar
et al. [1] and Rivaie et al. [12,13,14].
In this paper, we present our new k and compared its performance with the
classical formulas FR, PRP and RAMI method (see Rivaie et al. [13]). The
remaining sections of the paper will be organised as follows: In Section 2, a new
CG as well as a general algorithm for CG methods will be presented. In Section 3,
we will show the sufficient descent and global convergence proofs of our new
method. Numerical results and discussions will be presented in Section 4. Finally,
our conclusion based on these comparisons will be drawn in Section 5.
2. New proposed method
Recently, Rivaie et al. [13] proposed a new nonlinear conjugate gradient formula
defined by
)( 1
1
11
kk
T
k
k
k
k
k
T
k
RAMI
kgdd
gg
ggg
(11)
Motivated by the above, we propose our new k known as RMAR
k , where RMAR
denotes Rabi’u, Mustafa, Abdelrhaman and Rivaie. The RMAR
k is defined as
2
1
1
1
k
k
k
k
k
T
k
RMAR
k
d
dd
ggg
(12)
The algorithm is given as follows:
Step1: Initialization. Given nRx 0 , set 0k .
1848 Rabi’u Bashir Yunus et al.
Step2: Compute k based on (6), (7) and (12).
Step3: Compute kd based on (4). If 0kg , then stop.
Step4: Compute k based on exact line search (3).
Step5: Updating new point based on iterative formula (2).
Step6: Convergent test and stopping criteria.
If )()( 1 kk xfxf and kg then stop.
Otherwise go to Step 2 with .1 kk
3. Convergent analysis
In this section, the convergent properties of RMAR
k will be studied. We begin with
the sufficient descent condition.
3.1 Sufficient descent condition
For sufficient condition to hold, 2
kk
T
k gCdg for all 0k , and 0C .
(13)
The following theorem shows that our new formula with exact line search satisfies
the sufficient descent condition.
Theorem 1
Consider a CG method with search direction (4) and RMAR
k given as (12), then
condition (13) holds for all k ≥ 0.
Proof.
The proof is by induction. If 0k , then 2
000 gCdgT . Hence, condition (13)
holds true.
We also need to show that for all 0k , condition (13), will also hold true. From
(4) multiply both sides by T
kg 1 then,
k
T
kkkkk
T
k
T
kk
T
k dggdggdg 11
2
111111
(14)
For exact line search, we know that 01 k
T
k dg . Thus,
2
111 kk
T
k gdg
Hence, this condition holds true for 1k . The proof is completed. ■
The convergence properties of a new kind 1849
3.2 Global convergence properties
To study the global convergence properties, we first need to show that RMAR
k is
always nonnegative
02
1
12
1
2
1
11
k
kk
k
k
k
k
k
k
k
k
T
k
RMAR
k
d
dgd
gg
d
dd
ggg
(15)
Secondly, there is need to simplify RMAR
k
2
2
1
2
1
12
1
2
1
12
12
k
k
k
kk
k
k
k
k
k
T
k
k
k
k
d
g
d
dgd
gg
d
dgd
gg
(16)
Therefore, 2
2
120
k
kRMAR
k
d
g .
The following assumptions are required for global convergence analysis of CG
methods.
Assumption 1
(i) The level set )()(| 0xfxfRx n is bounded, where 0x is the
starting point and f is continuously differentiable function in a neighborhood N
of the level set .
(ii) The gradient )(xg is Lipschitz continuous in ,N there exists a constant
,0L such that yxLygxg )()( for any ., Nyx
Under this assumption we have the following important theorems.
Lemma 1
Suppose Assumptions 1 holds true, consider any CG methods of the form (2) and
(4), k is obtained by exact line search (3). Then, the following condition, known
as the Zoutendijk condition (see Zoutendijk [5]), holds
02
2)(
k k
k
T
k
d
dg
(17)
The proof of the above lemma can be found in [5]. The following convergent
theorem of the CG method is based on Lemma 1.
1850 Rabi’u Bashir Yunus et al.
Theorem 2
Suppose that Assumptions 1 hold, consider any CG methods of the form (2) and
(4), k is obtained by exact line search (3) and k is determined by (12). Then
0||||lim
kk
g (18)
Proof.
Rewriting (4) as
k
RMAR
kkk dgd 111 ,
and squaring both sides of the equation, we obtain
2
111
22
1
2
1 2 kk
T
kk
RMAR
kk gdgdd (19)
Substituting (16) into (19), we have
2
111
2
2
2
2
12
1 22
kk
T
kk
k
k
k gdgdd
gd
We have already proven that sufficient descent condition holds. Therefore, we
know that
2
111 kk
T
k gCdg
Hence, from (20),
2
1
2
12
4
12
1 24
kk
k
k
k ggcd
gd
cgd
gd k
k
k
k 214 2
12
4
12
1
(21)
Multiplying both sides of (21) with 2
1
2
1
k
k
d
g. Then,
cgd
g
d
g
d
gd k
k
k
k
k
k
k
k 214 2
12
4
1
2
1
2
1
2
1
2
12
1
2
1112
4
12
1 24
kk
T
k
k
k
k gdgd
gd
(20)
The convergence properties of a new kind 1851
2
2
1
2
1
4
1
2
1
2
1
2
1 412
k
k
k
k
k
kk
d
gc
d
g
d
gd
2
1
4
1
2
1
2
1
2
1
k
k
k
kk
d
g
d
gd
(22)
Based on Lemma 1 we know that
0lim2
1
2
11
k
k
T
k
k d
dg.
This will imply that if Theorem 2 is not true, then we have
2
1
2
11lim
k
k
T
k
k d
dg
and from
(22) we get2
1
4
1
k
k
d
g. Hence, Theorem 3 holds for sufficiently large .k ■
4. Numerical results
In this section, we carried out some numerical experiments to test Algorithm 2.1,
we utilize some of the test problems considered in Andrei [15] as shown in table
1, to analyse the efficiency of our new formula as compared with the FR, PRP,
and RAMI method. The comparisons are based on the number of iterations and
CPU time in seconds. The step size k is obtained using exact line search. We
considered 610 and kg as stopping criteria. All problems listed in table 1
are solved by MATLAB version 7.6.0 (R 2008a) subroutine programming. The
CPU processor used was Intel® Celeron (1.4 GHz, 2MB L3 Cache), with 4GB
DDR3 RAM. The performance results are shown in Figures 1 and 2, respectively,
based on the performance profile introduced by Dolan and More [4].
The idea behind this performance profile is to evaluate and compare the
performance of the set solvers S on a test P. Supposing that 𝑛𝑠 solvers and 𝑝𝑠 problem exists, for each problem p and solver s, they defined
tp,s= computing time required to solve problems p by solver s (the number of
iteration or CPU time).
1852 Rabi’u Bashir Yunus et al.
Table 1: A list of problem functions
No Function Dimension Initial Points
1 Three Hump Camel 2 (-2,2), (2,-2), (11,11), (15,15)
2 Six Hump Camel 2 (8,8), (-8,-8), (10,10), (-10,-10)
3 Booth 2 (10,10), (25,25), (50,50), (100,100)
4 Treccani 2 (5,5), (15,15), (50,50), (100,100)
5 Zettl 2 (5,5), (10,10), (20,20), (50,50)
6 Fletcher 2,4,10 (3,3,…,), (5,5,…,5), (7,7,…,7), (11,11,…,11)
7 Sphere 2,4,10 (5,5,…,5), (7,7,…,7),
(10,10,…,10), (100,100,…100)
8 Tridiagonal 2 2,4,10 (5,5,…,5), (10,10,…,10),
(15,15,…,15), (20,20,…,20)
9 Tridiagonal 1 2,4,100 (18,18,…,18), (19,19,…,19),
(22,22,…,22), (30,30,…,30)
10 Extended Penalty 2,4,10,100 (101,101,…,101), (109,109,…,109),
(141,141,…,141), (200,200,…,200)
11 Raydan 1 2,4,10,100 (1,1,…,1), (3,3,…,3), (-5,-5,…,-5), (-10,-10,…,-10)
12 Extended Maratos 2,4,10,100 (15,15,…,15), (16,16,…,16),
(25,25,…,25), (55,55,…,55)
13 Hager 2,4,10,100 (3,3,…,3),(10,10,…,10),(21,21,…,21),(23,23,…,23)
14 Quadratic QP2 2,4,10,100,500 (10,10,…,10), (20,20,…,20),
(35,35,…,35), (100,100,…,100)
15 Frudenstein and Roth 2,4,10,100,500,
1000
(3,3,…,3), (5,5,…,5), (7,7,…,7), (12,12,…,12)
16 Shalow 2,4,10,100,500,
1000,10000
(10,10,…,10), (25,25,…,25),
(50,50,…,50), (100,100,…,100)
17 Ex-Tridiagonal 1 2,4,10,100,500,
1000,10000
(3,3,…,3), (7,7,…,7), (9,9,…,9), (20,20,…,20)
18 White and Holst 2,4,10,100,500,
1000,10000
(3,3,…,3), (6,6,…,6), (9,9,…,9), (12,12,…,12)
19 Rosenbrock 2,4,10,100,500,
1000,10000
(13,13,…,13), (16,16,…,16),
(20,20,…,20), (30,30,…,30)
20 Diagonal 4 2,4,10,100,500,
1000,10000
(2,2,…,2), (5,5,…,5), (10,10,…,10), (15,15,…,15)
21 Denschnb 2,4,10,100,500,
1000,10000
(3,3,…,3), (8,8,…,8), (9,9,…,9), (25,25,…,25)
22 Extended Beale 2,4,10,100,500,
1000,10000
(1,1,…,1), (3,3,…,3), (7,7,…,7), (10,10,…,10)
23 Himmelblau 2,4,10,100,500,
1000,10000
(25,25,…,25), (55,55,…,55),
(101,101,…,101), (199,199,…,199)
24 Quartic 2,4,10,100,500,
1000,10000
(1,1,…,1), (3,3,…,3), (30,30,…,30),
(100,100,…,100)
The convergence properties of a new kind 1853
Figure 1: Performance profile based on number of iterations.
Figure 2: Performance profile based on the CPU time.
There is need for baseline for comparisons; we examine the performance on
problem p by solver s with the best performance by any solver on this problem,
that we utilize the performance ratio
Sst
tr
sp
sp
sp
:min ,
,
,
t
e0 e1 e2 e3 e4
Ps(t
)
0.0
0.2
0.4
0.6
0.8
1.0
PRP RMARRAMI
FR
t
e0 e1 e2 e3 e4
Ps(t
)
0.0
0.2
0.4
0.6
0.8
1.0
FR
RMARPRP RAMI
1854 Rabi’u Bashir Yunus et al.
Assume that a parameter spM rr , for all sp, is chosen, and
spM rr , if and
only if solver s does not solve problem p. The performance of solver s on any
given problem might be of interest, but we would like to obtain an overall
assessment of the performance of the solver, then it was defined
trPpsizen
t sp
p
s ,:1
)(
Thus )(tps was the probability for solver Ss that a performance ratio spr , was
within a factor Rt of the best possible ration. Then, function sp was the
cumulative distribution function for the performance ratio. The performance
profile ]1,0[: Rps for a solver was a non-decreasing, piecewise, and continuous
from the right. The value of )1(sp is the probability that the solver will win over
the rest of the solvers. In general, a solver with high values of )(p or at the top
right of the figure are preferable or represent the best solver.
Figures 1 and 2 illustrate the performance of these methods based on number of
iteration and CPU time respectively. We can easily perceive that our proposed
method is better when compared with FR method which solves 83% of the test
problems; RAMI method solves 89% of the test problems. Although PRP method
seems to be faster than our new method but it can only solves 95% of the test
problem. Therefore, we rank our new formula RMAR as the best, since it can
solves all of the test problems and achieve 100%.
5. Conclusion
In this paper, we present a new and simple k that satisfies the sufficient descent
conditions and converges globally. Numerical results clearly have shown that the
proposed k performs better than FR, PRP and RAMI method. In the future, we
intend to boost the performance of this new RMAR
k by employing the inexact line
search.
Acknowledgments. The authors would like to thank the government of Malaysia
for the funding of this research under the Fundamental Research Grant Scheme
(Grant no. 59256), UniSZA/14/GU (018) and also the government of Kano state,
Nigeria.
References
[1] A. Abashar, M. Mamat, M. Rivaie and I. Mohd. Global convergence
properties of a new class of conjugate gradient method for unconstrained
optimization, Appl. Math. Sci.8 (2014), 3307-3319.
http://dx.doi.org/10.12988/ams.2014.43246
The convergence properties of a new kind 1855
[2] B. T. Polyak, The conjugate gradient method in extremal problems, USSR
Comp. Math. Phys. 9 (1969), 94-112.
http://dx.doi.org/10.1016/0041-5553(69)90035-4
[3] D. Touati-Ahmed, C. Storey, Globally convergent hybrid conjugate gradient
methods, J. Optim. Theory Appl. 64 (1990), 379-397.
[4] E. Dolan, J.J More, Benchmarking optimization software with performance
profile, Math.Prog. 91 (2002), 201-213. http://dx.doi.org/10.1007/s101070100263
[5] G. Zoutendijk, Nonlinear programming computational methods, in: J. Abadie,
(Ed.), Integer and Nonlinear Programming, North-Holland, Amsterdam, 1970, 37-
86.
[6] J. C. Gilbert, J. Nocedal, Global convergence properties of conjugate gradient
methods for optimization, SIAM J. Optim. 2 (1992), 21-42.
http://dx.doi.org/10.1137/0802003
[7] L. Zhang, An improved Wei–Yao–Liu nonlinear conjugate gradient method
for optimization computation, Appl. Math. Comput. 215(2009), 2269–2274.
http://dx.doi.org/10.1016/j.amc.2009.08.016
[8] M. Al-Baali, Descent property and global convergence of the Fletcher-Reeves
method with inexact line search, IMA J. Numer. Anal. 5 (1985), 121-124.
http://dx.doi.org/10.1093/imanum/5.1.121
[9] M.J.D. Powell, Restart procedures for the conjugate gradient method, Math.
Program. 12(1977), 241-254. http://dx.doi.org/10.1007/bf01593790
[10] M.J.D. Powell, Nonconvex minimization calculations and the conjugate
gradient method in Lecture notes in mathematics, 1066, Springer-Verlag, Berlin,
(1984), 122–141. http://dx.doi.org/10.1007/bfb0099521
[11] M. R. Hestenes and E. L. Stiefel, Methods of conjugate gradients for solving
linear systems, J. Research Nat. Bur. Standards, 49 (1952), 409-436.
http://dx.doi.org/10.6028/jres.049.044
[12] M. Rivaie, M. Mamat, W.J. Leong, and M. Ismail, A new class of nonlinear
conjugate gradient coefficients with global convergence properties, Appl. Math.
Comput. 218 (2012), 11323-11332.
http://dx.doi.org/10.1016/j.amc.2012.05.030
[13] M. Rivaie, A. Abashar, M. Mamat and I. Mohd, The convergence properties
of a new type of conjugate gradient methods, Applied Mathematical Sciences, 8
(2014), 33-44. http://dx.doi.org/10.12988/ams.2014.310578
1856 Rabi’u Bashir Yunus et al.
[14] M. Rivaie, M. Mamat, M. Ismail and M. Fauzi, A comparative study of
conjugate gradient coefficient for unconstrained optimization, Aus. J. Bas. Appl.
Sci. 5 (2011), 947-951.
[15] N. Andrei, An unconstrained optimization test functions collection, Adv.
Modell. Optim. 10 (2008), 147-161.
[16] R. Fletcher, C. Reeves, Function minimization by conjugate gradients,
Comput. J. 7 (1964), 149-154. http://dx.doi.org/10.1093/comjnl/7.2.149
[17] R. Fletcher, Practical Method of Optimization, Vol. 1, Unconstrained
Optimization, Vol.I, Wiley, New York, 1987.
[18] W.W Hager, and H.C. Zhang, A new conjugate gradient method with
guaranteed descent and efficient line search, SIAM J. Optim. 16 (2005), 170-192.
http://dx.doi.org/10.1137/030601880
[19] W. W. Hager and H.C. Zhang, A survey of nonlinear conjugate gradient
methods, Pacific Journal of Optimization 2(1), (2006), 335-58.
[20] Y. Liu, and C. Storey, Efficient generalized conjugate gradient algorithms.
Part 1: Theory, J. Optim. Theory Appl. 69 (1991), 129–137.
http://dx.doi.org/10.1007/bf00940464
[21] Y.H. Dai, and Y. Yuan, A nonlinear conjugate gradient method with a strong
global convergence property, SIAM J. Optim. 10 (2000), 177–182.
http://dx.doi.org/10.1137/s1052623497318992
[22] Z. Wei, S. Yao and L. Liu, The convergence properties of some new
conjugate gradient methods, Appl. Math. Comput. 183 (2006), 1341-1350.
http://dx.doi.org/10.1016/j.amc.2006.05.150
Received: December 10, 2014; Published: March 9, 2015