DONG Xiao-liang,HE Yu-bo,KONG Xiang-yu,LI Wei-jun
(1.School of Mathematics and Information,Beifang University of Nationalities, Yinchuan,710021,China)
(2.Department of Mathematics and Application Mathematics,Huaihua University, Huaihua,418008,China)
(3.Network Information Technology Center,Beifang University of Nationalities, Yinchuan,710021,China)
A NEW CONJUGATE GRADIENT METHOD WITH STRONGLY GLOBAL CONVERGENCE AND SUFFICIENT DESCENT CONDITION
DONG Xiao-liang1,HE Yu-bo2,KONG Xiang-yu1,LI Wei-jun3
(1.School of Mathematics and Information,Beifang University of Nationalities, Yinchuan,710021,China)
(2.Department of Mathematics and Application Mathematics,Huaihua University, Huaihua,418008,China)
(3.Network Information Technology Center,Beifang University of Nationalities, Yinchuan,710021,China)
In this paper,we study the WYL conjugate gradient method for unconstrained optimization problems.By making use of the modified iterative scheme,the suffi cient descent conditions are satisfi ed at each iteration independent of the line search used.Also,by removing the original restriction on the parameter of the Wolfe conditions,we establish the strongly global convergence property for the general function.Numerical results illustrate that our method is effi cient for the test problems.
conjugate gradient method;suffi cient descent condition;strongly global convergence;Wolfe line search
Consider the following unconstrained optimization problem
where f:Rn→ R is a smooth and nonlinear function,and its gradient gk= ▽f(xk)is available.The conjugate gradient(CG)methods represent an important class of unconstrained optimization algorithms with strong local and global convergence properties and modest memory requirements.
The iterative formula of the CG method is given by
In(1.2),dkis a search direction updated by
and the steplength αk> 0 is commonly chosen to satisfy certain line search conditions. Among them,the so-called Wolfe conditions have attracted special attention in the convergence analyses and the implementations of CG methods,requiring that
where 0 < ρ < σ < 1 are often imposed on the line search.
The well-known formulas βkare Fletcher-Reeves,Hestenes–Stiefel,Polak-Ribi`ere-Polyak and Dai-Yuan formulas,which are specified by
Little is known concerning global convergence of the HS method for nonconvex minimization problems.However,the HS method,which resembles the PRP method in practical computation and is often recommended due to its superior numerical properties.
Recently,various modifications of the HS method received growing interests,in which suffi cient descent condition is important in the convergence analysis of CG method.
Hager and Zhang[2]proposed the CG DESCENT method and proved its convergence with the Wolfe search.Based on the MBFGS method[3],Zhang[4]and Dai[5]introduced modified HZ methods with the Armijo search fornonconvex objective functions.Zhang,Zhou and Li[6]proposed a three·term modifi ed PRP method,in which the property ?dTkgk= ‖gk‖2always holds without any line searches.To seek the CG direction that is closest to the direction of the scaled memoryless BFGS method,Dai and Kou[7]proposed a family of CGOPT methods.Yu et al.[8,9]proposed several modified spectral CG methods.For further study on the some recent advances,we may refer to[10–15].
In this paper,we willcontinue to study the HS-type method.One feature ofour method is that our method guarantee suffi cient descent condition and strongly global convergence.
The rest of this paper is organized as follows.In the next section,the new method is presented formally.Meanwhile,we prove the proposed method possesses suffi cient descentand strongly global convergence properties with the Wolfe line search.In Section 4,we report numerical comparisons with existing methods by using some test problems.
Recently,Wei,Yao and Huang[16]proposed the MHS,WYL and MLS methods,in which the parameters βkare given by
The globalconvergence ofthese methodswith the strong Wolfe line search was proved for cases that the parameter σ in(1.5)is constrained torespectively.Furthermore,with the same restriction on the parameter σ,these methods above possessed the suffi cient descent condition.
The above discussion prompts naturally us to raise the following question:
Is it possible to modify the direction of the MHS method in a suitable way,thereby enlarging its suffi cient descent properties and ensuring the globalconvergence for the general functions?
By taking the theoreticaladvantage ofthe MHS method into consideration,we give another method to guarantee suffi cient descent condition,in which strongly globalconvergence is satisfied.With our choices,the additionalrestriction on the parameter σ in(1.5)can be removed.The direction dkin our method is given by where ε1> 0 is a constant.
In(2.2),the parameters βkDHSand ?kare chosen as
where λ > 1 and ε1> 0 are two given constants.
For convenience,we call our method(2.2)as DHS method in later part of this paper and formally state the steps of this method as follows.
Algorithm 2.1(DHS method)
Step 1Choosing constants λ > 1, ε1> 0 and ε > 0.Select an initial point x1∈ Rn, set k=1.
Step 2Test a criterion for stopping the iterations.If ‖gk‖ < ε,then stop,otherwise calculate dkby(2.2).
Step 3Determine the steplength by the Wolfe conditions.
Step 4Calculate the new iterate by xk+1=xk+ αkdk,set k=k+1 and goto Step 2.
The descent property is an indispensable factor in the convergence analysis of CG method.More exactly,if there exists a constant c1> 0 such that
holds for all k ∈ N,then the so-called suffi cient descent condition holds.
Property(2.5)is guaranteed in our method,as proved in the following lemma.
Lemma 2.1Let{xk}and{dk}be generated by the DHS method.Then the direction dksatisfies the suffi cient descent condition(2.5),which is independent of any line search.
ProofFor k=1,it follows that dT1g1= ?‖g1‖2.Now we mainly consider the case where
is satisfied.It follows from(2.2)that
Then the conclusion holds by letting c1=1 ? λ?1.
In this section,we prove the global convergence of the proposed CG method.We need the following assumptions,which are generally assumed in the literature.
Assumption 3.1
Boundedness Assumption:the levelset defined by ? ={x ∈ Rn|f(x) ≤ f(x1)}with x1to be the initialpoint,is bounded;
Lipschitz Assumption:in some neighborhood ?0of ?,the objective function f is continuously differentiable,and its gradient g is Lipschitz continuous,namely,there exist a constant L > 0 such that
Theorem 3.1Suppose that Assumption 3.1 holds.Let{xk}and{dk}be generated by Algorithm 2.1.Then there exists a constant c such that
ProofSince d1= ?g1,the result obviously holds for k=1.
If k > 1,it suffi ce to consider the case whereholds. It follows form the definition βkDHSin(2.3)and the Cauchy inequality that
Also,we can estimate the upper bound for|?k|,presented by
Combining this with(3.3)yields
The conclusion of the following theorem,called the Zoutendijk condition,is often used to prove globalconvergence of nonlinear CG method.
Theorem 3.2(see[17])Suppose that Assumption 3.1 holds.Consider the general CG method,where dkis a descent direction and αksatisfies the Wolfe conditions.Then we have
we complete the proof.
For the generalfunction,we can develop a strongly globalconvergence result as follows.
Theorem 3.3Suppose that Assumption 3.1 holds.Let{xk}be generated by Algorithm
2.1.Then
ProofThe bound for||dk||in(3.2)coupled with(2.5)indicates that
Equation(3.7)leads to(3.6).
In this section,we provide the implementation detailof the new methods to verify the numericalperformance.Our tests problems come form Mor′e[18].
We stop the iteration if the inequality||gk|| ≤ 10?6is satisfied.We use “F”to denote the numbers of iteration exceed 2000.All algorithms use exactly the same implementation of the strong Wolfe conditions with ρ =10?3,σ =0.5.The parameters in(2.2)are specified by ε1=10?12,μ =10.
In order to rank the CG methods,we subsequently use the method of Dai and Ni[19] to compare the performance of our methods with that of the other methods in[16].
First,we compute the total number of the function and its gradient evaluations by the formula Ntotal=N F+m ? N G,where m=5.
Second,for all our two methods,we evaluate their effi ciency with respect to the WYL method in the following way.For each problem i,compute the ratio
and the geometric mean of these ratios over all the test problems,where S denotes the set of the problems and|S|the number of elements in S.
Finally,we present the rtotalin the Table 4.2:
[1]Dai Y H,Yuan Y X.Nonlinear conjugate gradient methods[M].Shanghai:Shanghai Science and Technology Press,2000.
[2]Hager WW,Zhang H C.A new conjugate gradient method with guaranteed descent and an effi cient line search[J].SIAM J.Optim.,2005,16(1):170–192.
[3]Li D H,Fukushima M.A modified BFGS method and its global convergence in nonconvex minimizationp[J].J.Comput.Appl.Math.,2001,129(1):15–35.
[4]Zhang L,Zhou W J.On the global convergence of Hager–Zhang conjugate gradient with Armijo line search[J].Math.Numer.Sin.,2008,28(5):840–845.
[5]Dai Z F,Wen F H.Global convergence of a modifi ed Hestenes–Stiefel nonlinear conjugate gradient method with Armijo line search[J].Numer.Algor.,2012,59(1):79–93.
[6]Zhang L,Zhou W J,Li D H.A descent modified Polak–Ribi`ere–Polyak conjugate gradient method and its global convergence[J].IMA J.Numer.Anal.,2006,26(4):629–640.
[7]Dai Y H,Kou C X.A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search[J].SIAM J.Optim.,2013,23(1):296–320.
[8]Yu G H,Guan L T,Chen WF.Spectralconjugate gradient methods with suffi cient descent property for large-scale unconstrained optimization[J].Optim.Meth.Softw.,2008,23(2):275–293.
[9]Yu G H.New descent nonlinear conjugate gradient methods for large–scale unconstrained optimization,Tech.Rep.Department of scientific computation and computer applications[D].Guangzhou: Sun Yat-Sen University,2005.
[10]Dong X,Liu H,He Y,Yang X.Amodifi ed Hestenes–Stiefelconjugate gradient method with suffi cient descent condition and conjugac condition[J].J.Comp.Appl.Math.,2015,281:239–249.
[11]Dong X,Liu H,Xu Y,Yang X.Some nonlinear conjugate gradient methods with suffi cient descent condition and global convergence[J].Optim.Lett.,2015,9(7):1421–1432.
[12]Dong X.Comment on“A new three-term conjugate gradient method for unconstrained problem”[J]. Numer.Algorithms,2016,72(1):173–179.
[13]Dong X,Liu H,He Y.A self–adjusting conjugate gradient method with suffi cient descent condition and conjugacy condition[J].J.Optim.Theory Appl.,2015,165(1):225–241.
[14]Dong X,Li C,He Y,Yu G.Arelaxed two–stage multi–spllting algorithm for bi–obstacle problems[J]. J.Math.,2011,31(2):323–330.
[15]He Y,Dong X.On the convergence of Levenberg–Marquardt method for nonlinear inequalities[J]. J.Math.,2012,32(1):25–34.
[16]Yao S W,Wei Z X,Huang H.Anote about WYL’s conjugate gradient method and its applications[J]. Appl.Math.Comput.,2007,191(2):381–388.
[17]Wolfe,P.Convergence conditions for ascent methods[J].SIAM Rev.,1969,11(2):226–235
[18]Mor′e J J,Garbow B S,Hillstrom K E.Testing unconstrained optimization software[J].ACM Trans. Math.Softw.,1981,7(1):17–41.
[19]Dai Y H,Ni Q.Testing diff erent conjugate gradient methods for large-scale unconstrained optimization[J].J.Comput.Math.,2003,21(3):311–320.
一類新的具有充分下降條件和強收斂性的共軛梯度法
董曉亮1,2,何郁波3,孔翔宇1, 李衛(wèi)軍1
(1.北方民族大學數(shù)學與信息學院, 寧夏 銀川 750021)
(2.懷化學院數(shù)學與應(yīng)用數(shù)學系, 湖南 懷化 418008)
(3.北方民族大學網(wǎng)絡(luò)信息技術(shù)中心, 寧夏 銀川 750021)
本文研究了求解無約束優(yōu)化問題的WYL共軛梯度法.利用修正迭代格式,得到了算法在每步迭代能產(chǎn)生不依賴于搜索條件的充分下降方向. 同時, 在原算法中關(guān)于Wolfe條件中參數(shù)去掉的情況下, 獲得了本文算法是強收斂的.數(shù)值實驗說明本文算法可以有效求解測試問題.
共軛梯度法;充分下降條件;強收斂性;Wolfe搜索
90C30;65K10
O224
tion:90C30;65K10
A < class="emphasis_bold">Article ID:0255-7797(2017)02-0231-08
0255-7797(2017)02-0231-08
?Received date:2014-05-02 Accepted date:2015-03-16
Foundation item:Supported by National Natural Science Foundation of China(11601012; 11661002);Ningxia Natural Science Foundation(NZ13095;NZ16093);Scientifl c Research Foundation of the Higher Education Institutions of Ningxia(NGY2016134);Beifang University of Nationalities Foundation(2016SXKY06;2014XBZ09;2014XBZ01;2013XYZ028).
Biography:Dong Xiaoliang(1981–),male,born at Lingwu,Ningxia,lecturer,major in nonlinear optimization.