李清霞
摘 要: 傳統(tǒng)支持向量機(jī)挖掘方法可以對(duì)領(lǐng)域數(shù)據(jù)實(shí)現(xiàn)挖掘,但在復(fù)雜多變環(huán)境下數(shù)據(jù)挖掘離散程度較大。提出海量數(shù)據(jù)的支持向量機(jī)優(yōu)化挖掘方法,構(gòu)造靜態(tài)粒子空間,局限海量數(shù)據(jù)挖掘離散程度,形成小規(guī)模的、多簇團(tuán)的粒子挖掘數(shù)據(jù)集;將單粒子挖掘數(shù)據(jù)進(jìn)行離散性擬合,以多簇團(tuán)粒子整合離散運(yùn)算,保證挖掘計(jì)算進(jìn)行周期性運(yùn)行;對(duì)同軌挖掘計(jì)算進(jìn)行條件約束,實(shí)現(xiàn)小離散程度的數(shù)據(jù)挖掘。仿真實(shí)驗(yàn)驗(yàn)證結(jié)果表明,支持向量機(jī)優(yōu)化挖掘方法在復(fù)雜多變環(huán)境下具有較高的穩(wěn)定性,并且挖掘離散度小、挖掘信息精度較高。
關(guān)鍵詞: 海量數(shù)據(jù); 支持向量機(jī); 多簇團(tuán)粒子; 數(shù)據(jù)擬合; 整合運(yùn)算; 挖掘離散; 優(yōu)化方法
中圖分類號(hào): TN911?34; TN913 文獻(xiàn)標(biāo)識(shí)碼: A 文章編號(hào): 1004?373X(2018)06?0137?04
Abstract: The traditional data mining method based on support vector machine (SVM) can mine the domain data, but has high data mining dispersion degree in the complex and changeable environment. Therefore, an SVM?based optimization mining method of massive data is proposed to construct the static particle space, limit the data mining discrete degree, and form the small?sized and multi?cluster particle mining data sets. The discrete fitting is carried out for the single?particle mining data, and the multi?cluster particles are integrated for discrete operation to ensure the periodical operation of mining calculation. The conditional constraint is performed for the one?orbit mining calculation to realize the data mining with low discrete degree. The simulation experimental results show that the optimization mining method based on SVM has high stability in the complex and changeable environment, low mining discrete degree and high information mining accuracy.
Keywords: massive data; support vector machine; multi?cluster particle; data fitting; integration operation; mining dispersion; optimization method
0 引 言
支持向量機(jī)是近年來廣泛受到關(guān)注的機(jī)器學(xué)習(xí)與分析方法,廣泛應(yīng)用于函數(shù)估計(jì)、模式識(shí)別、圖像處理和生物信息學(xué)等諸多領(lǐng)域。傳統(tǒng)的支持向量機(jī)挖掘方法可以對(duì)領(lǐng)域內(nèi)數(shù)據(jù)進(jìn)行挖掘,但在復(fù)雜多變環(huán)境條件下,存在挖掘離散性較高的問題。根據(jù)以上問題,提出海量數(shù)據(jù)的支持向量機(jī)優(yōu)化挖掘方法。實(shí)驗(yàn)結(jié)果表明,海量數(shù)據(jù)的支持向量機(jī)優(yōu)化挖掘方法,在多變復(fù)雜的環(huán)境下具有良好的穩(wěn)定性,在較小的離散性下保持較高的數(shù)據(jù)挖掘精度。
1 海量數(shù)據(jù)的支持向量機(jī)優(yōu)化挖掘技術(shù)
1.1 構(gòu)建靜態(tài)粒子空間局限離散程度
對(duì)單個(gè)數(shù)據(jù)單元進(jìn)行離散性提取,將具有同一性的數(shù)據(jù)進(jìn)行歸一化處理,利用核函數(shù)的計(jì)算,將反應(yīng)同一性的參數(shù)進(jìn)行粒子運(yùn)算,海量信息計(jì)算提取的同一性數(shù)據(jù)定義了初始化群的離線性質(zhì)[1]。對(duì)定義后的初始化群粒子進(jìn)行離散度約束,形成靜態(tài)粒子空間。通過靜態(tài)粒子空間,刪除失真數(shù)據(jù),在空間粒子離散度受到空間的局限。在進(jìn)行粒子的整合運(yùn)算時(shí)[2],將粒子數(shù)據(jù)整合,挖掘數(shù)據(jù)會(huì)按照局限進(jìn)行同軌運(yùn)行計(jì)算,周期計(jì)算存在一定的偏差,運(yùn)行偏差較大時(shí)會(huì)加大離散程度,因此需要重新認(rèn)定挖掘數(shù)據(jù)[3?4]。偏差度為零或者較小時(shí)可將挖掘數(shù)據(jù)進(jìn)行輸出。本文設(shè)計(jì)的海量數(shù)據(jù)的支持向量機(jī)挖掘流程圖如圖1所示。
1.2 離散性擬合計(jì)算
離散數(shù)據(jù)擬合計(jì)算是根據(jù)定義的初始化群粒子的特性,進(jìn)行離散度的整合計(jì)算。將離散度高的數(shù)據(jù)進(jìn)行擬合計(jì)算,剔除原有數(shù)據(jù)屬性,成為新的具有合理性粒子的挖掘數(shù)據(jù),擬合計(jì)算過程用粒子整合離散中的擬合屬性,對(duì)數(shù)據(jù)超過平均浮動(dòng)的粒子進(jìn)行剝離[5?6],對(duì)超過5倍浮動(dòng)的粒子進(jìn)行規(guī)律性的刪除,對(duì)平均方差為3~5倍的粒子進(jìn)行比例擬合整合[7?8],保證單個(gè)粒子的計(jì)算精度。其離散數(shù)據(jù)擬合過程如圖2所示。
粒子通過離散數(shù)據(jù)擬合計(jì)算,形成多簇團(tuán)粒子空間,對(duì)多簇團(tuán)粒子空間進(jìn)行粒子整合,粒子模擬整合[9]可分為點(diǎn)動(dòng)整合和線動(dòng)整合兩種。其多簇團(tuán)粒子整合示意圖如圖3所示。endprint
點(diǎn)動(dòng)整合是大部分多簇團(tuán)粒子呈線性分布,而在整合曲線上出現(xiàn)單簇團(tuán)或多處簇團(tuán)偏離呈現(xiàn)。將偏離整合曲線的簇團(tuán)進(jìn)行合理移動(dòng),其最大移動(dòng)值不能超過簇團(tuán)局限數(shù)據(jù),離散數(shù)據(jù)擬合整合前的最大值是移動(dòng)的最小值[10],實(shí)現(xiàn)多簇團(tuán)周期規(guī)律運(yùn)行。線動(dòng)整合是較多的簇團(tuán)粒子呈無規(guī)律分布,且數(shù)據(jù)挖掘呈偏離狀態(tài),整合曲線連接多簇團(tuán)粒子代表性差。因而進(jìn)行整合曲線的調(diào)整[11],即線動(dòng)整合。整合曲線的調(diào)整必須滿足有[12]的粒子在整合曲線外,且整合曲線上的粒子不能代表所有多簇團(tuán)粒子的計(jì)算值。整合曲線移動(dòng)原則要盡可能地連接所有多簇團(tuán)粒子,連接具有代表性以及規(guī)律性。對(duì)不能連接所有多簇團(tuán)粒子的,先進(jìn)行線動(dòng)整合后,再進(jìn)行點(diǎn)動(dòng)整合,使挖掘數(shù)據(jù)具備周期性。
1.3 實(shí)現(xiàn)低離散度數(shù)據(jù)挖掘
對(duì)多簇團(tuán)粒子整合運(yùn)算后,進(jìn)行數(shù)據(jù)的同軌挖掘運(yùn)算,海量數(shù)據(jù)的支持向量機(jī)挖掘方法,具有一定的數(shù)據(jù)代表性,但數(shù)據(jù)代表性需進(jìn)行挖掘伴隨性驗(yàn)證,驗(yàn)證成功后對(duì)挖掘數(shù)據(jù)進(jìn)行輸出,其同軌運(yùn)算選用偏正挖掘數(shù)據(jù)與負(fù)偏挖掘數(shù)據(jù),實(shí)測(cè)挖掘數(shù)據(jù)能夠進(jìn)行同軌計(jì)算,保證運(yùn)行正負(fù)偏差固定,以保證同軌挖掘運(yùn)算的穩(wěn)定性。其數(shù)據(jù)挖掘同軌框架圖如圖4所示。
通過MySQL,Share,Nothing,MySQL nab計(jì)算方法,根據(jù)數(shù)據(jù)挖掘特點(diǎn),進(jìn)行不同對(duì)象的挖掘計(jì)算,伴隨同軌挖掘計(jì)算周期運(yùn)行。其同軌運(yùn)行算法的離散程度與偏離數(shù)據(jù)如表1所示。
通過周期挖掘計(jì)算的同軌運(yùn)行,穩(wěn)定地挖掘數(shù)據(jù)信息,利用硬件設(shè)備進(jìn)行數(shù)據(jù)傳遞,對(duì)不穩(wěn)定的挖掘數(shù)據(jù)、離散較大的數(shù)據(jù)進(jìn)行重新粒子擬合、多簇團(tuán)粒子的整合運(yùn)算。重新擬定同軌計(jì)算,保證對(duì)每組挖掘數(shù)據(jù)的結(jié)果不帶有離散性,從而實(shí)現(xiàn)海量數(shù)據(jù)的支持向量機(jī)數(shù)據(jù)挖掘。
2 仿真實(shí)驗(yàn)與測(cè)試
為了驗(yàn)證解決支持向量機(jī)的挖掘方法的有效性,本文采用傳統(tǒng)數(shù)據(jù)挖掘方法與海量數(shù)據(jù)支持的向量機(jī)數(shù)據(jù)挖掘方法進(jìn)行對(duì)比實(shí)驗(yàn),利用支持向量機(jī)的離散程度測(cè)試和斜方差分析驗(yàn)證支持向量機(jī)的挖掘方法的有效性。
通過仿真模擬實(shí)驗(yàn),對(duì)特定實(shí)驗(yàn)對(duì)象進(jìn)行數(shù)據(jù)挖掘不少于10萬次或2 h。采用兩種支持向量機(jī)的挖掘,記錄標(biāo)記數(shù)據(jù)點(diǎn),利用計(jì)算機(jī)系統(tǒng)生成離散程度分布圖,如圖5所示。
根據(jù)離散程度分布圖可以看出,傳統(tǒng)數(shù)據(jù)挖掘方法的數(shù)據(jù)挖掘能力的浮點(diǎn)率較高,離散程度較大。并且存在個(gè)別失真點(diǎn),數(shù)據(jù)不確定性。具有3處挖掘數(shù)據(jù)集中心域,一次為零偏差范圍,其他兩次分別存在正偏差和負(fù)偏差。
方差分析是調(diào)節(jié)協(xié)變量對(duì)因變量的影響效應(yīng),對(duì)實(shí)驗(yàn)進(jìn)行統(tǒng)計(jì)控制的一種綜合方差分析和回歸分析的方法。通過綜合方差分析與回歸分析,得出斜方差分析圖,如圖6所示。
圖6中:[Σx]代表穩(wěn)定值;[Σy]代表離散值。[Σx]小于2代表穩(wěn)定,[Σx=0]代表測(cè)量值恒定最穩(wěn)定。[Σy]小于4代表離散現(xiàn)象可以忽略不計(jì),[Σy=0]代表測(cè)量值無離散現(xiàn)象?;谛狈讲罘治鰣D,對(duì)[Σx]與[Σy]進(jìn)行測(cè)定,其傳統(tǒng)數(shù)據(jù)挖掘方法的穩(wěn)定性為[Σx=1]、離散性為[Σy=3],海量數(shù)據(jù)的支持向量機(jī)數(shù)據(jù)挖掘方法的穩(wěn)定性為[Σx=0.3]、離散性為[Σy=1]。通過離散程度測(cè)試、變異系數(shù)測(cè)試和斜方差分析實(shí)驗(yàn)的結(jié)果。說明海量數(shù)據(jù)的支持向量機(jī)數(shù)據(jù)挖掘方法具有良好的離散性、數(shù)據(jù)可靠性。
3 結(jié) 語(yǔ)
通過靜態(tài)離粒子空間的構(gòu)建,以及粒子的多重去離散性運(yùn)算,優(yōu)化挖掘方法,經(jīng)實(shí)驗(yàn)驗(yàn)證結(jié)果表明,海量數(shù)據(jù)的支持向量機(jī)優(yōu)化挖掘方法,在復(fù)雜多變的環(huán)境下,具有較小的離散性、良好的穩(wěn)定性以及挖掘精度。
參考文獻(xiàn)
[1] 肖白,聶鵬,穆鋼,等.基于多級(jí)聚類分析和支持向量機(jī)的空間負(fù)荷預(yù)測(cè)方法[J].電力系統(tǒng)自動(dòng)化,2015,39(12):56?61.
XIAO Bai, NIE Peng, MU Gang, et al. A spatial load forecasting method based on multilevel clustering analysis and support vector machine [J]. Automation of electric power systems, 2015, 39(12): 56?61.
[2] 王寧,謝敏,鄧佳梁,等.基于支持向量機(jī)回歸組合模型的中長(zhǎng)期降溫負(fù)荷預(yù)測(cè)[J].電力系統(tǒng)保護(hù)與控制,2016,44(3):92?97.
WANG Ning, XIE Min, DENG Jialiang, et al. Mid?long term temperature?lowering load forecasting based on combination of support vector machine and multiple regression [J]. Power system protection and control, 2016, 44(3): 92?97.
[3] 焦衛(wèi)東,林樹森.整體改進(jìn)的基于支持向量機(jī)的故障診斷方法[J].儀器儀表學(xué)報(bào),2015,36(8):1861?1870.
JIAO Weidong, LIN Shusen. Overall?improved fault diagnosis approach based on support vector machine [J]. Chinese journal of scientific instrument, 2015, 36(8): 1861?1870.
[4] 王瑜,苑津莎,尚海昆,等.組合核支持向量機(jī)在放電模式識(shí)別中的優(yōu)化策略[J].電工技術(shù)學(xué)報(bào),2015,30(2):229?236.
WANG Yu, YUAN Jinsha, SHANG Haikun, et al. Optimization strategy research on combined?kernel support vector machine for partial discharge pattern recognition [J]. Transactions of China electrotechnical society, 2015, 30(2): 229?236.endprint
[5] 薛浩然,張珂珩,李斌,等.基于布谷鳥算法和支持向量機(jī)的變壓器故障診斷[J].電力系統(tǒng)保護(hù)與控制,2015,43(8):8?13.
XUE Haoran, ZHANG Keheng, LI Bin, et al. Fault diagnosis of transformer based on the cuckoo search and support vector machine [J]. Power system protection and control, 2015, 43(8): 8?13.
[6] 張玉欣,程志峰,徐正平,等.參數(shù)尋優(yōu)支持向量機(jī)在基于光聲光譜法的變壓器故障診斷中的應(yīng)用[J].光譜學(xué)與光譜分析,2015,35(1):10?13.
ZHANG Yuxin, CHENG Zhifeng, XU Zhengping, et al. Application of optimized parameters SVM based on photoacoustic spectroscopy method in fault diagnosis of power transformer [J]. Spectroscopy and spectral analysis, 2015, 35(1): 10?13.
[7] 李霄,王昕,鄭益慧,等.基于改進(jìn)最小二乘支持向量機(jī)和預(yù)測(cè)誤差校正的短期風(fēng)電負(fù)荷預(yù)測(cè)[J].電力系統(tǒng)保護(hù)與控制,2015,43(11):63?69.
LI Xiao, WANG Xin, ZHENG Yihui, et al. Short?term wind load forecasting based on improved LSSVM and error forecasting correction [J]. Power system protection and control, 2015, 43(11): 63?69.
[8] 梁禮明,鐘震,陳召陽(yáng).支持向量機(jī)核函數(shù)選擇研究與仿真[J].計(jì)算機(jī)工程與科學(xué),2015,37(6):1135?1141.
LIANG Liming, ZHONG Zhen, CHEN Zhaoyang. Research and simulation of kernel function selection for support vector machine [J]. Computer engineering and science, 2015, 37(6): 1135?1141.
[9] SHAO Y H, HUA X Y, LIU L M, et al. Combined outputs framework for twin support vector machines [J]. Applied intelligence, 2015, 43(2): 424?438.
[10] GUI G, PAN H, LIN Z, et al. Data?driven support vector machine with optimization techniques for structural health monitoring and damage detection [J]. KSCE journal of civil engineering, 2017, 21(2): 523?534.
[11] THARWAT A, HASSANIEN A E, ELNAGHI B E. A BA?based algorithm for parameter optimization of support vector machine [J]. Pattern recognition letters, 2017, 93(7): 13?22.endprint