基于流形正則化和核方法的最小二乘算法

2017-12-26 05:39汪寶彬彭超權(quán)李學(xué)鋒

中南民族大學(xué)學(xué)報(bào)（自然科學(xué)版） 2017年4期

關(guān)鍵詞：希爾伯特流形正則

汪寶彬，彭超權(quán)，李學(xué)鋒

(中南民族大學(xué) 數(shù)學(xué)與統(tǒng)計(jì)學(xué)學(xué)院，武漢 430074)

基于流形正則化和核方法的最小二乘算法

汪寶彬，彭超權(quán)，李學(xué)鋒

(中南民族大學(xué) 數(shù)學(xué)與統(tǒng)計(jì)學(xué)學(xué)院，武漢 430074)

研究了再生核希爾伯特空間中流形正則化下的最小二乘算法的學(xué)習(xí)能力和收斂速度.該算法能夠充分利用輸入空間的幾何特點(diǎn)以及半監(jiān)督學(xué)習(xí)中無標(biāo)記樣本的信息，提高算法的有效性和學(xué)習(xí)效率.另外，討論了該算法中正則參數(shù)的選取，這對(duì)算法實(shí)現(xiàn)具有現(xiàn)實(shí)的意義.

流形學(xué)習(xí)；正則化；最小二乘算法；核方法；再生核希爾伯特空間

對(duì)應(yīng)的經(jīng)驗(yàn)泛化誤差為：

給定假設(shè)函數(shù)空間H,最小二乘算法定義為：

fz:=arg minf∈Hεz(f).

(1)

在本文中，把再生核希爾伯特空間(RKHS)HK[1]作為假設(shè)空間，這里K(·,·):X×X→R是Mercer核，即連續(xù)、對(duì)稱、半正定且：

1 建立算法

近年來，多罰研究[8]引起統(tǒng)計(jì)學(xué)界的廣泛關(guān)注，其在病態(tài)問題中構(gòu)造穩(wěn)健解的優(yōu)勢(shì)被工業(yè)界看重.它能夠結(jié)合條件概率中的先驗(yàn)測(cè)度信息和罰函數(shù)項(xiàng)，進(jìn)行合適的正則參數(shù)選擇，達(dá)到學(xué)習(xí)的最優(yōu)效果.在學(xué)習(xí)理論的背景下，Berklin等人[9]提出多罰正則化方法，對(duì)條件概率中的幾何結(jié)構(gòu)進(jìn)行分析，從理論和實(shí)驗(yàn)對(duì)該方法進(jìn)行分析；石等人[10]通過積分算子理論，在分布式學(xué)習(xí)背景中對(duì)多罰方法的數(shù)學(xué)理論有更進(jìn)一步的研究.本文主要考慮如下算法:

(2)

性質(zhì)1 對(duì)于任意μ>0,算法(2)有唯一解如下：

自然地，定義算法(2)的積分形式:

(3)

顯然，fμ=(LK+μT*T)-1LK(f*). 這里，積分算子LK:HK→HK定義為：

2 主要結(jié)果及證明

‖fμ-f*‖=O(μr).

證明根據(jù)表達(dá)式(3)，有：

fμ-f*=(LK+μT*T)-1LKf*-f*=

-μT*T(LK+μT*T)-1f*.

因此：

現(xiàn)在估計(jì)的重點(diǎn)在于抽樣誤差fz,μ-fμ，首先要用到以下引理1.

引理1 假設(shè)ζi,i=1,…，m是定義在希爾伯特空間(H,‖·‖)上一列獨(dú)立同分布的隨機(jī)變量，并且存在常數(shù)S>0,使得‖ζi‖≤S,i=1,…,m，則對(duì)任意0<δ<1，有：

于是，有定理2.

定理2 假設(shè)存在S>0, 有|y|≤S,則對(duì)任意0<δ<1，有：

證明從算法(2)的表達(dá)式得到：

從而，

‖fz,μ-fμ‖≤

對(duì)于第二項(xiàng)‖LK-LK,z‖，令ζi=·,KxiKKxi,則‖·,KxiKKxi‖≤1,i=1,…，m.由引理1有δ.結(jié)合第一項(xiàng)和第二項(xiàng)的估計(jì)，并將δ替換成δ/2,得到在至少1-δ的概率下，有下式成立：

定理2證畢.

現(xiàn)在已有抽樣誤差和逼近誤差兩項(xiàng)估計(jì)，由算法(2)的學(xué)習(xí)速率很容易得到定理3.

(4)

(5)

[1] Micchelli C A, Xu Y S, Zhang H Z. Universal kernels[J]. Journal of Machine Learning Research,2006,7(4):2651-2667.

[2] Evgeniou T, Pontil M, Poggio T. Regularization networks and support vector machines[J]. Advances in Computational Mathematics,2000,13:1-53.

[3] Bartlett P L, Mendelson S. Rademacher and Gaussian complexities: risk bounds and structural results[J]. Journal of Machine Learning Research,2002,3:463-482.

[4] Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization[J].Annals of Statistics, 2004,32:56-85.

[5] Bartlett P L, Jordan M I, McAuliffe J D. Convexity, classification, and risk bounds[J]. Journal of the American Statistical Association, 2006,101:138-156.

[6] Zhou D X. Capacity of reproducing kernel spaces in learning theory[J]. IEEE Transactions on Information Theory,2003,49:1743-1752.

[7] Shi L, Feng Y L, Zhou D X. Concentration estimates for learning withl1-regularizer and data dependent hypothesis spaces[J]. Applied and Computational Harmonic Analysis,2011,31:286-302.

[8] Abhishake, Sivananthan S. Multi-penlty regularization in learning theory[J]. Journal of Complexity,2016,36(C):141-165.

[9] Berklin M, Niyogi P, Sindhwan V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples[J]. Journal of Machine Learning Research,2006,7(1):2399-2434.

[10] Shi L, Guo Z C, Lin S B. Distributed learning with multi-penalty regularization[J]. Applied and Computational Harmonic Analysis,DOI: 10.1016/j.acha.2017.06.001.

TheLeastSquareAlgorithmBasedonManifoldRegularizationandKernelMethod

WangBaobin,PengChaoquan,LiXuefeng

(College of Mathematics and Statistics, South-Central University for Nationalities, Wuhan 430074, China)

In this paper, we considered the learning ability and convergence rate of the least square algorithm under the manifold regularization in the Reproducing Kernel Hilbert Space(RKHS). This algorithm can make full use of the geometric construction characteristics of the input space and improve the validity and the learning efficiency of the classical least square algorithm by extracting the information from the unlabeled data. Moreover, we discussed the choice of the regularization parameter, which is meaningful to the design of the algorithm.

manifold learning;regularization; least square algorithm; kernel method; reproducing kernel Hilbert space

2017-09-02

汪寶彬(1976-)，男，副教授，博士，研究方向：概率統(tǒng)計(jì)，E-mail: wbb1818@126.com

國(guó)家自然科學(xué)基金資助項(xiàng)目(11671307)；湖北省自然科學(xué)基金資助項(xiàng)目(2017CFB523)

O211

1672-4321(2017)04-0143-03

国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡

基于流形正則化和核方法的最小二乘算法

1 建立算法

2 主要結(jié)果及證明