陳磊
摘要:數(shù)據(jù)挖掘過程中的數(shù)據(jù)預(yù)處理是一項(xiàng)很重要的工作。分析了現(xiàn)有分類算法J48直接處理原始期貨數(shù)據(jù)時(shí)出現(xiàn)的分類準(zhǔn)確率和預(yù)測(cè)準(zhǔn)確率低等弊端。針對(duì)期貨數(shù)據(jù)的連續(xù)性和時(shí)序性特征,在Weka數(shù)據(jù)挖掘平臺(tái)下,提出一種面向期貨數(shù)據(jù)的連續(xù)屬性劃分策略,其主要思想是對(duì)連續(xù)屬性采用不同的分段標(biāo)識(shí)方法,從中找出一種最適合期貨數(shù)據(jù)特征的劃分機(jī)制,進(jìn)而在有效降低數(shù)據(jù)過度擬合的前提下,較大提高J48算法的分類準(zhǔn)確率和預(yù)測(cè)準(zhǔn)確率。
關(guān)鍵詞:Weka;期貨;J48 數(shù)據(jù)挖掘;數(shù)據(jù)預(yù)處理;連續(xù)屬性劃分
DOIDOI:10.11907/rjdk.161196
中圖分類號(hào):TP391文獻(xiàn)標(biāo)識(shí)碼:A文章編號(hào):1672-7800(2016)006-0173-03
參考文獻(xiàn):
[1]IAN H WITTEN,EIBE FRANK,MARK A HALL.Data mining practical machine learning tools and techniques[M].ELSEVIER,2010.
[2]IAN H WITTEN,EIBE FRANK,MARK A.數(shù)據(jù)挖掘:實(shí)用機(jī)器學(xué)習(xí)工具與技術(shù)[M].李川,譯.北京:機(jī)械工業(yè)出版社,2014.
[3]袁梅宇.數(shù)據(jù)挖掘與機(jī)器學(xué)習(xí)——Weka應(yīng)用技術(shù)與實(shí)踐[M].北京:清華大學(xué)出版社,2014.
[4]JIAWEI HAN,KAMBER M.數(shù)據(jù)挖掘概念與技術(shù)[M].范明,孟小峰,譯.北京:機(jī)械工業(yè)出版社,2004.
[5]DAI W,JI W.A MspReduce implementattion of C4.5 decison tree algorithm[J].International Journal of Database Theory and Application,2014,7(1):49-60.
[6]QUINLAN J R.C4.5:programs for machine learning[M].Burlington:Morgan Kaufmann Publishers,1993:17-42.
[7]ABELLAN J,MORAL S.Building classification trees using the total uncertainly criterion[J].Journal of the Royal Statistical Society,Series B:Methodological,1996,58(1):3-57.
[8]HETTICH S,BAY S D.The UCI KDD archive[EB/OL].http://kdd.ics.uic.edu/.
[9]QUINLAN J R.Induction of decision tree[J].Machine learning,1986(1):81-106.
[10]楊學(xué)兵,張俊.決策樹算法及其核心技術(shù)[J].計(jì)算機(jī)技術(shù)與發(fā)展,2007,17(1):44-46.
[11]孫超利.基于決策樹的數(shù)據(jù)流挖掘算法的研究[J].太原科技大學(xué)學(xué)報(bào),2006,27(4):269-270.
[12]Jsp中一些數(shù)據(jù)類型的轉(zhuǎn)換[EB/OL].http://blog.csdn.net/xuxurui007/article/details/18045943,2015,10.
[13]陳愛萍,范媛媛.一種連續(xù)屬性值域劃分的離散化方法[J].計(jì)算機(jī)應(yīng)用研究,2012(5):154-158.
[14]TSAI C J,LEE C I,YANG WEI-PANG.A discretization algorithm based on class-attribute contingency coefficient[J].Information Sciences,2008,178(3):714-731.
[15]MEHMED KANTARDZIC.數(shù)據(jù)挖掘——概念、模型、方法和算法[M].北京:清華大學(xué)出版社,2003.
[16]蔣帥.K-均值聚類算法研究[D].西安:陜西師范大學(xué),2010.
[17]新浪財(cái)經(jīng)——期貨[EB/OL].http://finance.sina.com.cn/futuremarket/,2015,10.