韋春妙 龐建華 黃李韋 羅杰明
摘??? 要:優(yōu)化算法研究,主要工作是給迭代點(diǎn)尋求可接受且有效的步長(zhǎng)及可行的下降方向.在求解大規(guī)模無(wú)約束優(yōu)化問(wèn)題時(shí),共軛梯度法被廣泛應(yīng)用.其中, Polak-Ribiere-Polyak方法(簡(jiǎn)稱:PRP方法)是眾多共軛梯度法中數(shù)值表現(xiàn)相對(duì)較好的,但它在許多線搜索下并不具備全局收斂性,如何發(fā)揮PRP方法數(shù)值優(yōu)良,而克服其收斂性差,是學(xué)者們致力探索的熱點(diǎn)課題.本文提出新的PRP參數(shù)公式,并對(duì)Armijo線搜索方法進(jìn)行修正,建立了新Armijo線搜索下的PRP共軛梯度算法,證明算法滿足充分下降條件,并證明算法在適當(dāng)條件下具有全局收斂性.
關(guān)鍵詞:無(wú)約束優(yōu)化; PRP共軛梯度法;新Armijo線搜索;全局收斂性
中圖分類號(hào): O224??????????? DOI:10.16375/j.cnki.cn45-1395/t.2019.02.016
0??? 引言
諸多領(lǐng)域如工業(yè)、農(nóng)業(yè)、經(jīng)濟(jì)等很多最優(yōu)化問(wèn)題,都可以歸結(jié)為大規(guī)模的無(wú)約束優(yōu)化問(wèn)題.本文考慮無(wú)約束優(yōu)化問(wèn)題:
[min f(x) , x∈Rn]??? []??????????????????????????????????????????????????????????????? (1)
其中[f]:[Rn→R]為連續(xù)可微函數(shù).
求解式(1)的方法有很多, 共軛梯度法是最為常用的方法之一. 共軛梯度法的研究已有五十多年的歷史, 由于其具有較低的存儲(chǔ)量與較小的計(jì)算量等特性,被學(xué)者們廣泛研究.著名數(shù)學(xué)家Beale等,把該方法應(yīng)用到非線性優(yōu)化問(wèn)題中并取得了顯著性成果.經(jīng)典的共軛梯度法有Fletcher-Reeves方法[1](簡(jiǎn)稱[FR]法)、Hestenses-Stiefel方法[2](簡(jiǎn)稱[HS]法)、Dai-Yuan方法[3](簡(jiǎn)稱[DY]法)、Polak-Ribiere-Polyak方法[4](簡(jiǎn)稱[PRP]法),這些方法中,[FR]法收斂速度慢,數(shù)值計(jì)算不理想;[HS]法采用精確線搜索時(shí),若目標(biāo)函數(shù)是非凸函數(shù),此方法通常不具有全局收斂性.而[DY]法在特定的線搜索下,不能完全使得算法滿足充分下降條件, Al-Baali[[5]],Gilbert等[[6]]證明了在某些線搜索下,除了滿足基本條件,[DY]法需具備更多條件才具有全局收斂性.[PRP]方法是當(dāng)今眾多方法中,公認(rèn)的數(shù)值表現(xiàn)較優(yōu)的共軛梯度法.當(dāng)運(yùn)用[PRP]共軛梯度法在極小點(diǎn)附近產(chǎn)生小步長(zhǎng)時(shí),它后續(xù)所生成的搜索方向[dk]會(huì)自動(dòng)向負(fù)梯度方向靠近,從而有效地避免[FR]方法中出現(xiàn)不斷產(chǎn)生小步長(zhǎng)的問(wèn)題.不僅如此,[PRP]方法同時(shí)可以有效解決存儲(chǔ)空間過(guò)大的問(wèn)題.但是在非精確線搜索下,即使[PRP]方法有比較好的數(shù)值結(jié)果,在全局收斂性上卻并不優(yōu)良.許多學(xué)者不斷探究,獲得了可喜的結(jié)論,韋增欣等[7]提出了一個(gè)非精確[Armijo]線搜索的[PRP]算法,并證明了非凸無(wú)約束優(yōu)化問(wèn)題在所提算法下具有全局收斂性.本研究將[Armijo]線搜索進(jìn)一步修正,建立一個(gè)新的共軛梯度算法,并證明了在此線搜索下算法具有全局收斂性.
共軛梯度法的通常迭代格式為:
[xk+1=xk+ikdk, k=0,? 1,? 2,? 3,? 4,? …]????????????????????????????????????????? (2)
其中,[xk]表示第[k]次迭代,[ik]為第[k]次的步長(zhǎng),[dk]表示第[k]次搜索的方向,一般共軛梯度法的搜索方向定義為:
[dk=-gk, ???????????????? k=0 ;-gk+βkdk-1, k>0;]???????????????????????????? (3)
其中,[gk=?f(xk)], [βk]是調(diào)控參數(shù). [βk]由于有不同的取法,因而產(chǎn)生了不同方法.
如何沿著搜索方向[dk]找到一個(gè)適合的步長(zhǎng)因子[ik]是很重要的,比較常用的線搜索方法有:Wolfe-Powell型線搜索,Armijo-Goldstein型線搜索等.韋增欣等[7]運(yùn)用上述準(zhǔn)則研究[PRP]方法的全局收斂性,并且提出一種線搜索Armijo-Type [line] [search](ATLS)方法.方法如下:
令[α∈[0,12)],[c∈(0,1)],[μ>0].令[tk]=[ρjk],則:
[f(xk+ρjdk)-f(xk)≤αρjgTkdk-μ2(ρj)2dk2]
上式中,[α]、c、p為介于0和1之間的常數(shù),[μ]為非負(fù)常數(shù),[jk]是使得式子成立的最小非負(fù)整數(shù)[j],且
[g(xk+ρjdk)TQPRPk(j)≤-cg(xk+ρjdk)2]
其中T表示轉(zhuǎn)置,[QPRPk(j)]定義為:
[QPRPk(j)=-g(xk+ρjdk)+g(xk+ρjdk)Tg(xk+ρjdk)-gkgk2dk]
證明了[PRP]方法在此線搜索下,滿足充分下降條件:
[gTkdk≤-cgk2],[c∈(0,1)]?????????????????????????? (4)
本文嘗試了一個(gè)新的線搜索Armijo-Type [line] [search](簡(jiǎn)記ATLS*),對(duì)于搜索方向,基于經(jīng)典的[βPRPk],本文提出新的[PRP]參數(shù)公式:
[β*k=gTk2gk-1-gkgkgk-gk-1gTkgk-1].??????????????????????? (5)
1??? 算法描述
1.1?? ATLS*線搜索與算法
首先,令[ik=ρjk],[ik]為步長(zhǎng)因子,[jk]是使下列式子成立的最小的非負(fù)整數(shù)[j],
[f(xk+ρjdk)-f(xk)≤αρjgTkdk-μ2(ρj)2dk2]????????????????? (6)
且有
[g(xk+ρjdk)TQ*k(j)≤-cg(xk+ρjdk)2]???????????????????? (7)
其中[Q*k(j)]定義為:
[Q*k(j)=-g(xk+ρjdk)+g(xk+ρjdk)T2gk-g(xk+ρjdk)g(xk+ρjdk)g(xk+ρjdk)-gkgk2dk]??? (8)
其次,對(duì)目標(biāo)函數(shù)[f(x)]進(jìn)行如下假設(shè):
假設(shè)1?? [φ={x∈Rn:? f(x)≤f(x1)}]是一個(gè)有界的水平集.
假設(shè)2?? [f(x)]是連續(xù)可微的函數(shù),[f(x)]的梯度[Lipschitz]連續(xù).即存在常數(shù)[L>0],使得對(duì)[?x,y∈φ]都有[g(x)-g(y)≤Lx-y].
引理1? 若對(duì)于[?k∈N,? gTkdk<0],就存在一個(gè)非負(fù)整數(shù)[jk],使[ik=ρjk]滿足新的[ATLS*]線搜索算法(6)—?????? 算法(8),其中[α∈[0, 12) ,? μ>0].
證明? 首先證明有[j0].對(duì)于所有[j>j0],使得式(6)成立.用反證法證明.
假設(shè)對(duì)于任意[j]:
[f(xk+ρjdk)-f(xk)>αρjgTkdk-μ2(ρj)2dk2].
由泰勒展開(kāi)式有:
[ρjgTkdk+ο(ρj)>αρjgTkdk-μ2(ρj)2dk2].
不等式兩邊除以[ρj]并且令[j→∞],得到:
[gTkdk≥αgTkdk].
因此,由[gTkdk<0],可得[α>1].與[α∈[0,1/2)]相違背.
其次,證明存在[j1∈(j0,+∞)]使得式(7)成立.假設(shè)結(jié)論不成立,對(duì)于任意[j>j0],有:
[g(xk+ρjdk)TQ*k(j)>-cg(xk+ρjdk)2]?????????????????? (9)
將[Q*k(j)]代入式(9),即:
[g(xk+ρjdk)TQ*k(j)=-g(xk+ρjdk)2+g(xk+ρjdk)T2gk-g(xk+ρjdk)g(xk+ρjdk)g(xk+ρjdk)-gkgk2g(xk+ρjdk)Tdk>-cg(xk+ρjdk)2]
令[j∈(j0,+∞)]且[j→∞],又由[g]的連續(xù)性可得:
[-gk2≥-cgk2]
由[gk≠0](0——零向量)知[c>1].與[c∈(0, 1)]相矛盾.
綜上所述,存在一個(gè)[jk]使得式(6)和式(7)成立.
1.2?? 新ATLS*線搜索下的PRP算法(簡(jiǎn)稱新ATLS*算法)
Step 1? 給定初值[x1∈χn, α∈[0, 1/2) , c∈(0, 1) , μ>0.]令[d1=-g1, k=1.]若[g1=0],則停止,否則轉(zhuǎn)Step 2.
Step 2? 求出滿足新[ATLS*]線搜索式(6)—式(8)的步長(zhǎng)[ik>0].
Step 3? 迭代公式為:[xi+1=xi+ikdk, gk+1=g(xk+1).]若[gk+1=0],則停止,否則轉(zhuǎn)Step 4.
Step 4? 由式(5)計(jì)算[βk+1],由式(3)計(jì)算[dk+1].
Step 5? 令[k=k+1],轉(zhuǎn)Step 2.
2??? 算法的收斂性分析
引理2? 考慮上述算法,對(duì)于[?k∈N],若[gTkdk<0],則:
[gTk+1dk+1≤-cgk+12].??????????;????????????????? (10)
證明 根據(jù)引理1,由新[ATLS*]線搜索產(chǎn)生的步長(zhǎng),并由上列算法可以得到迭代點(diǎn)[xk+1],[gk+1],[βk+1],[dk+1].
所以
[gTk+1dk+1=-gk+12+β*k+1gTk+1dk=- gk+12+gTk+12gk-gk+1gk+1gk+1-gkgk2gTk+1dk=]
[-g(xk+ikdk)2+g(xk+ikdk)T2gk-g(xk+ikdk)g(xk+ikdk)g(xk+ikdk)-gkgk2g(xk+ikdk)Tdk=g(xk+ρjdk)TQ*k(j)]
因?yàn)閇ik]滿足式(7),所以
[gTk+1dk+1][=g(xk+ρjdk)TQ*k(j)][≤-cgk+12]
即式(10)成立.
定理1? 對(duì)新ATLS*線搜索下的PRP算法,如果[gk≠0],則對(duì)[?k>0],有[gTkdk≤-cgk2],[c∈(0,1)],即算法充分下降.
證明? 假設(shè)[g1≠0],因?yàn)閇d1=-g1],那么[gT1d1=-g12≤-cg12].如果[gk≠0],則有[gTkdk≤-cgk2<0],且[c∈(0,1)],有引理2可得:
[gTk+1dk+1≤-cgk+12]
通過(guò)數(shù)學(xué)歸納法便可推出定理1.
引理3?? 當(dāng)假設(shè)1成立時(shí),有:
[limk→∞ikdk=0 ,]?????????????????????????? (11)
與
[limk→∞-ikgTkdk=0 .]????????????????????????? (12)
證明?? 因?yàn)閇{f(xk)}]是遞減序列,而由新ATLS*算法產(chǎn)生的序列[xk]包含于[φ],由假設(shè)1,存在一常數(shù)[f1]使得
[limk→∞f(xk)=f1 .]
那么,
[k=1∞(f(xk)-f(xk+1))=limk→∞k=1∞(f(xk)-f(xk+1))]
[limN→∞(f(x1)-f(xN+1))=f(x1)-f1]
則:
[k=1∞(f(xk)-f(xk+1))<+∞].
又由式(6)得:
[f(xk+ikdk)-f(xk)≤αikgTkdk-μ2(ik)2dk2],
所以,[k=1∞i2kdk2<∞且 k=1∞-ikgTkdk<∞],因此,式(11)與式(12)成立.
引理4? 如果假設(shè)1和假設(shè)2均成立,[xk]由新ATLS*算法得到,存在一個(gè)常數(shù)[M1>0]使得對(duì)[?k],有
[ik≥M1gk2dk2].
證明?? 分成如下情形證明:
情形1?? [ik=1]時(shí)有:
[gk2≤1cgTkdk≤1cgkdk]
由式(10),有:
[gk≤1cdk]
由[ik=1]有:
[gk2≤1c2dk2=ikc2dk2]
所以
[ik≥c2gk2dk2]
情形2?? [ik<1]時(shí),有[jk-1]是一個(gè)非負(fù)整數(shù),從對(duì)[ik]的定義(6)不能都滿足[ikρ=ρjk-1].
因?yàn)閇ikρ]不滿足式(6),有:
[f(xk+(ikρ)dk)-f(xk)>α(ikρ)gTkdk-μ2(ikρ)2dk2].
運(yùn)用中值定理得到[θk∈(0,1)],有:
[(ikρ)?g(xk+θk(ikρ)dk)Tdk>α(ikρ)gTkdk-μ2(ikρ)2dk2]
兩邊除以[ikρ],有:
[g(xk+θk(ikρ)dk)Tdk>αgTkdk-μ2(ikρ)dk2]
不等式兩邊減去[gTkdk],有:
[(g(xk+θk(ikρ)dk)-gk)Tdk>-(1-α)gTkdk-μ2(ikρ)dk2,]
結(jié)合假設(shè)2有:
[Lθk(ikρ)dk2>-(1-α)gTkdk-μ2(ikρ)dk2].
因此,
[ik>2(1-α)ρ2Lθk+μ(-gTkdk)dk2].
故由式(4)與[θk∈(0,1)],有:
[ik>2c(1-α)ρ2Lθk+μgTk2dk2≥2c(1-α)ρ2L+μgTk2dk2]
綜上情形1和情形2得證.
令:????????????????????????????????????????????????? [M1=minc2, 2(1-α)ρ2L+μ],
引理4? 得證.
引理5? 如果假設(shè)1和假設(shè)2均成立,且對(duì)于[?k,?ε>0,]使得
[gk≥ε] ???????????????????????????????? (13)
即有常數(shù)[M2>0],對(duì)[?k]有:
[dk≤M2]????????????????????????????????????? (14)
證明? 運(yùn)用Cauchy-schwartz不等式,由式(3)有:
[dk≤gk+β*kdk-1≤gk+gTk2gk-1-gkgkgk-gk-1gk-12dk-1≤gk+gk?2gk-1-gkgkgk-gk+gk-gk-1gk-12?dk-1≤gk+gk2gk-1-gkgkgk-gk+gk-gk-1gk-12?dk-1≤gk+(2gk-1-gk)gk-(gk)gk+gkgk-gk-1gk-12?dk-1≤gk+2(gk-1-gk)gk+gkgk-gk-1gk-12?dk-1≤gk+2gk-1-gkgk+gkgk-gk-1gk-12?dk-1≤gk+3gk?gk-gk-1gk-12 dk-1]
再由假設(shè)2和式(13)可得:
[dk≤gk+3L gk?xk-xk-1gk-12 dk-1≤gk+ gk 3Lik-1dk-1ε2 dk-1]???????? (15)
由于[xk]是有界序列,結(jié)合假設(shè)2有,存在[M3>0],使得對(duì)[?k],
[gk≤M3]?????????????????????????????????? (16)
所以由式(15)和式(16)有:
[dk≤M3+3M3Lε2ik-1dk-12=M3+3M3Lε2ik-1dk-1dk-1]????????????? (17)
又由引理[3]的式(11)可推出:存在一個(gè)常數(shù)[q∈(0,1)]與一個(gè)整數(shù)[k0],對(duì)[?k≥k0]時(shí),有[3M3Lε2ik-1dk-1≤q].因此對(duì)于[?k>k0],由式(17)得:
[dk≤M3+qdk-1≤M3(1+q+q2+…+qk-k0-1)+qk-k0dk0≤?????????? M31-q+qk-k0dk0≤M31-q+dk0]
令[M2=maxd1 ,? d2 ,…,? dk0 ,? M31-q+dk0],那么式(14)對(duì)[?k]成立.
定理2?? 若上述假設(shè)1和假設(shè)2均成立,那么有:
[limk→∞infgk=0].?????????????????????????????? (18)
證明?? 首先,假設(shè)定理2不成立,則存在一個(gè)常數(shù)[ε>0],對(duì)[?k],有
[gk≥ε.]????????;??????????????????????? (19)
結(jié)合引理4,即存在一個(gè)常數(shù)[M1≥0],對(duì)于[?k, ik≥M1gk2dk2],再由引理5有[gk2≤M2M1ikdk],令[k→∞],結(jié)合式(11)與該不等式可以得到和式(19)相矛盾的結(jié)論,所以式(18)成立,即證明了全局收斂性.
3??? 討論
構(gòu)造有效的全局優(yōu)化算法[8-9], 通常是在已有算法的基礎(chǔ)上不斷改進(jìn)和創(chuàng)新.如何選取有效的非精確線搜索使得[PRP]共軛梯度法具有全局收斂性,是研究[PRP]法理論部分的難點(diǎn)所在,與之相關(guān)研究結(jié)果相對(duì)較少,本文所構(gòu)造的新ATLS*線搜索,其形式比較復(fù)雜,但并不影響算法的全局收斂性,在一定程度上豐富了[PRP]共軛梯度法的理論研究.
4??? 結(jié)論
[PRP]共軛梯度法的優(yōu)良數(shù)值效果獲得了大家的青睞, 然而,優(yōu)化算法的理論研究是最優(yōu)化理論與算法的一個(gè)重要內(nèi)容,遺憾的是,[PRP]共軛梯度法的全局收斂性在眾多的非精確線搜索下卻沒(méi)有全局收斂性.本文在韋增欣教授等研究基礎(chǔ)上,努力尋找并構(gòu)建合適的線搜索,使[PRP]共軛梯度法具有優(yōu)良的收斂性,雖然其線搜索的形式較為復(fù)雜,但最終證明[PRP]共軛梯度法在新構(gòu)建的線搜索下具有全局收斂性.
參考文獻(xiàn)
[1]???? FLETCHER R,REEVES C M.Function minimization by conjugate gradients[J].Computer Journal,1964(7):149-154.
[2]???? HESTENES M R,STIEFEL E. Method of conjugate gradient for solving linear equations[J].Journal of Research National? Bureau of Standards,1952(49):409-436.
[3]???? DAI Y H,YUAN Y.Convergence properties of the conjugate descent method[J].Advances in Mathematics,1996,25:552-562.
[4]???? POLYAK P T. The conjugate gradient method in extremal problems[J].Ussr Computation Mathematics and Mathematical? physics,1969(9):94-112.
[5]???? AL-BAALI M. Descent property and global convergence of the Fletcher-Reeves method with inexact line search[J].Ima???? Journal and Numerical Analysis,1985(5):121-124.
[6]???? GILBERT J C,NOCEDAL J. Global convergence properties of conjugate gradient methods for optimization[J].SIAM Journal of Optimization,1992,2(1):21-42.
[7]???? WEI Z X, LI G Y , QI L Q. Global convergence of the Polak-Ribiere-Polyak conjugate method with an Armijo-type inexact line search for nonconvex unconstrained optimization problems[J].Mathematics of Computation,2008,77(264):2173-2193.
[8]???? 吳慶軍.一個(gè)全局收斂的共軛梯度法[J].廣西工學(xué)院學(xué)報(bào),2004,15(3):89-92.
[9]???? 朱孝晶,周圓兀,龔熠,等.基于多樣性優(yōu)化策略的粒子群算法[J].廣西工學(xué)院學(xué)報(bào),2001,22(1):74-77.
Global convergence analysis of the Polak-Ribiere-Polyak conjugate gradient method with a new Armijo-type line search
WEI Chunmiao, PANG Jianhua, HUANG Liwei, LUO Jieming
(Science College, Guangxi University of Science and Technology, Liuzhou 545006, China)
Abstract: The optimization algorithm research focuses on finding an acceptable and effective step size and feasible descent direction. The conjugate gradient methods are widely used in solving the???????? large-scale unconstrained optimization problems. Among them, Polak-Ribiere-Polyak method (PRP) has better numerical effect than other conjugate gradient methods. However,the global convergence has not been established for the PRP method with some inexact linear line search . How proposed new line search condition which was designed to get convergence theory to ensure the PRP method is????? globally convergent. At the same time, it remains its excellent numerical effect. In this paper, a new Armijo line searching method with a new parameter formula is proposed. The PRP conjugate gradient algorithm under the new Armijo line search is established. It is proved that the algorithm satisfies the sufficient descent condition and that the algorithm has global convergence under appropriate conditions.
Key words: unconstrained optimization; PRP conjugate gradient method; new Armijo line search;? global convergence