顧沈明 顧金燕 吳偉志 李同軍 陳超君
(浙江海洋大學(xué)數(shù)理與信息學(xué)院 浙江舟山 316022) (浙江省海洋大數(shù)據(jù)挖掘與應(yīng)用重點(diǎn)實(shí)驗(yàn)室(浙江海洋大學(xué)) 浙江舟山 316022)
?
不完備多粒度決策系統(tǒng)的局部最優(yōu)粒度選擇
顧沈明 顧金燕 吳偉志 李同軍 陳超君
(浙江海洋大學(xué)數(shù)理與信息學(xué)院 浙江舟山 316022) (浙江省海洋大數(shù)據(jù)挖掘與應(yīng)用重點(diǎn)實(shí)驗(yàn)室(浙江海洋大學(xué)) 浙江舟山 316022)
(gsm@zjou.edu.cn)
粒計(jì)算是知識(shí)表示和數(shù)據(jù)挖掘的一個(gè)重要方法.從粒計(jì)算來(lái)看,一個(gè)粒是由多個(gè)比較小的顆粒組成的更大的一個(gè)單元.在許多實(shí)際應(yīng)用中,由于不同標(biāo)記尺度對(duì)數(shù)據(jù)集進(jìn)行分割會(huì)得到不同層次的粒度,許多人在用粒計(jì)算解決問(wèn)題時(shí)自然而然地考慮不同層次的粒度問(wèn)題.這就促使思考如何選擇一個(gè)合適的粒度層次來(lái)解決問(wèn)題.圍繞不完備多粒度決策系統(tǒng),研究了基于局部最優(yōu)粒度的規(guī)則提取方法.1)介紹了不完備多粒度決策系統(tǒng)的概念;2)在協(xié)調(diào)的不完備多粒度決策系統(tǒng)中定義了最優(yōu)粒度和局部最優(yōu)粒度、介紹了基于局部最優(yōu)粒度的屬性約簡(jiǎn)和規(guī)則提取方法,在不協(xié)調(diào)的不完備多粒度決策系統(tǒng)中引入了廣義決策、定義了廣義最優(yōu)粒度和廣義局部最優(yōu)粒度,并給出了基于廣義局部最優(yōu)粒度的屬性約簡(jiǎn)和規(guī)則提取方法;3)給出了在公開(kāi)的數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果.
決策系統(tǒng);粒計(jì)算;局部最優(yōu)粒度;多粒度;規(guī)則提取
粒計(jì)算是當(dāng)前智能信息處理領(lǐng)域中的一個(gè)研究熱點(diǎn).它模擬人的思考模式,以粒為基本計(jì)算單位,以處理大規(guī)模復(fù)雜數(shù)據(jù)并建立有效計(jì)算模型為目標(biāo).在1979年Zadeh[1]首次提出信息粒化的概念,認(rèn)為人類(lèi)認(rèn)知能力可概括為?;⒔M織和因果3個(gè)主要特征;1985年Hobbs[2]提出了粒度的概念,描述粒計(jì)算雛形的一些基本特征;而粒計(jì)算的概念是Lin[3]提出;后來(lái)Lin[4]和Yao[5]分別對(duì)粒計(jì)算的一些基本問(wèn)題進(jìn)行了闡述;張鈴等人[6]提出的商空間理論被認(rèn)為粒計(jì)算的另一重要模型.
近年來(lái),許多涉及具體應(yīng)用背景的粒計(jì)算模型與方法相繼提出[7-13],其中粗糙集對(duì)粒計(jì)算研究的推動(dòng)和發(fā)展起著重要作用[14-18].粗糙集是由Pawlak[19]提出的一種有效地處理不精確、不確定信息的數(shù)學(xué)理論方法.屬性約簡(jiǎn)和規(guī)則提取是粗糙集理論中非常重要的研究?jī)?nèi)容[20-21];張文修等人[22]對(duì)不協(xié)調(diào)目標(biāo)系統(tǒng)進(jìn)行屬性約簡(jiǎn);王國(guó)胤等人[23]研究了信息觀下的屬性約簡(jiǎn),以及屬性約簡(jiǎn)的不確定度量;Mi等人[24]提出一種基于變精度粗糙集的屬性約簡(jiǎn)方法;苗奪謙等人[25]給出了知識(shí)約簡(jiǎn)的一種啟發(fā)式算法;Skowron和Rauszer[26]提出的可辨識(shí)矩陣為求取屬性約簡(jiǎn)提供了很好的思路.
粗糙集在進(jìn)行數(shù)據(jù)分析時(shí),往往把數(shù)據(jù)表示為屬性—值的表格形式,稱(chēng)作信息系統(tǒng).在經(jīng)典的信息系統(tǒng)中,每一個(gè)對(duì)象在每一個(gè)屬性上只能取一個(gè)觀測(cè)值.這樣的信息系統(tǒng)稱(chēng)為單粒度信息系統(tǒng).單一粒度框架下的知識(shí)表示與數(shù)據(jù)挖掘很難滿足復(fù)雜問(wèn)題的需要.人們努力從多層次、多角度來(lái)研究復(fù)雜問(wèn)題,這就是多粒度的思想.Qian等人[27]提出樂(lè)觀粗糙集與悲觀粗糙集的多粒度粗糙集模型,其主要思想是通過(guò)屬性選擇進(jìn)行交、并運(yùn)算,再進(jìn)行數(shù)據(jù)處理.此外,人們觀察物體或處理數(shù)據(jù)時(shí),會(huì)根據(jù)不同的尺度得到不同層次的觀測(cè)值,即反映同一對(duì)象同一屬性在不同粒度層次下的變化情況.針對(duì)這種情況,Wu等人[28]提出基于多粒度標(biāo)記劃分的粗糙集數(shù)據(jù)分析方法,還進(jìn)一步討論在此模型下最優(yōu)粒度的選擇問(wèn)題[29-31];Gu等人[32-33]還給出協(xié)調(diào)和不協(xié)調(diào)的多粒度標(biāo)記決策系統(tǒng)中的知識(shí)獲取算法.
本文主要針對(duì)不完備多粒度決策系統(tǒng),進(jìn)一步討論局部最優(yōu)粒度的選擇問(wèn)題.分別在協(xié)調(diào)的、不協(xié)調(diào)的不完備多粒度決策系統(tǒng)討論局部最優(yōu)粒度選擇與廣義局部最優(yōu)粒度選擇問(wèn)題,并討論相應(yīng)的屬性約簡(jiǎn)與規(guī)則提取等問(wèn)題.
1.1 不完備信息系統(tǒng)
一個(gè)信息系統(tǒng)是一個(gè)二元組(U,C),其中U={x1,x2,…,xn}是一個(gè)非空有限對(duì)象集,稱(chēng)為論域;C={a1,a2,…,am}是一個(gè)非空有限屬性集,對(duì)于任意的a∈C,滿足a:U→Va,也就是a(x)∈Va,x∈U,其中Va={a(x)|x∈U}稱(chēng)作a的值域[34].
當(dāng)信息系統(tǒng)中的某些值是未知的,則稱(chēng)為不完備信息系統(tǒng),仍表示為(U,C).用符號(hào)“*”表示未知值或缺省值,即如果a(x)=*,就認(rèn)為x在屬性a上的值是未知的.
對(duì)于給定的一個(gè)不完備信息系統(tǒng)(U,C),并且A?C,記:
RA={(x,y)∈U×U:?a∈A,
a(x)=a(y)∨a(x)=*∨a(y)=*}.
(1)
顯然,RA是自反和對(duì)稱(chēng)的,即RA是相似關(guān)系,但一般是非傳遞的[35],記:
SA(x)={y∈U:(x,y)∈RA},x∈U,
(2)
SA(x)稱(chēng)為對(duì)象x關(guān)于RA的相似類(lèi)[36],記:
URA={SA(x):x∈U}.
(3)
設(shè)(U,C)是一個(gè)不完備信息系統(tǒng),A?C,X?U,X關(guān)于RA的下近似和上近似定義為
(4)
(5)
1.2 決策規(guī)則
決策系統(tǒng)是一個(gè)二元組S=(U,C∪syggg00),其中(U,C)是一個(gè)信息系統(tǒng),C稱(chēng)為條件屬性,d?C稱(chēng)為決策屬性,映射d:U→Vd,其中Vd是決策屬性d的值域[34].
(6)
規(guī)則t→s的可信度定義為
(7)
規(guī)則的可信度反映了滿足規(guī)則前件的實(shí)例中有多少可以劃分到規(guī)則后件表示的決策類(lèi)中的比例.規(guī)則的可信度通常用來(lái)作為規(guī)則的有效性度量,應(yīng)用于規(guī)則的評(píng)價(jià)和檢驗(yàn)中.
(8)
(9)
其中,1≤k≤I-1.這樣也得到了I個(gè)不完備信息系統(tǒng)Sk=(U,Ck),k=1,2,…,I.
稱(chēng)S=(U,C∪syggg00)為不完備多粒度決策系統(tǒng),如果(U,C)是不完備多粒度信息系統(tǒng),d?C,是一個(gè)單粒度決策屬性,d:U→Vd,Vd是決策屬性d的值域.顯然,對(duì)于給定k(1≤k≤I),就有決策系統(tǒng)Sk=(U,Ck∪syggg00),因此就有I層不完備的決策系統(tǒng).
例1. 設(shè)論域U={x1,x2,…,x8},屬性集C={a1,a2,a3},決策屬性d;第1層決策系統(tǒng)S1=(U,C1∪syggg00),如表1所示;第2層決策系統(tǒng)S2=(U,C2∪syggg00),如表2所示;第3層決策系統(tǒng)S3=(U,C3∪syggg00),如表3所示.
Table 1 Decision System of the First Level Granularity表1 第1層粒度的決策系統(tǒng)
Table 2 Decision System of the Second Level Granularity表2 第2層粒度的決策系統(tǒng)
Table 3 Decision System of the Third Level Granularity表3 第3層粒度的決策系統(tǒng)
在不完備多粒度決策系統(tǒng)S=(U,C∪syggg00)中,對(duì)于給定的k(1≤k≤I),有Sk=(U,Ck∪syggg00).若RC1?Rd,則稱(chēng)S1=(U,C1∪syggg00)是協(xié)調(diào)的,同時(shí)也稱(chēng)S=(U,C∪syggg00)是協(xié)調(diào)的.若S1=(U,C1∪syggg00)是不協(xié)調(diào)的,則稱(chēng)S=(U,C∪syggg00)是不協(xié)調(diào)的.下面對(duì)協(xié)調(diào)的、不協(xié)調(diào)的不完備多粒度決策系統(tǒng)分別進(jìn)行討論.
3.1 局部最優(yōu)粒度
在不完備多粒度決策系統(tǒng)S=(U,C∪syggg00)中,對(duì)于給定的k(1≤k
例2. 例1中的系統(tǒng)S=(U,C∪syggg00),在每一層上有相似類(lèi):
第1層.SC1(x1)={x1},SC1(x2)={x2},SC1(x3)={x3},SC1(x4)={x4},SC1(x5)={x5},SC1(x6)={x6},SC1(x7)={x7},SC1(x8)={x8};
第2層.SC2(x1)={x1,x2},SC2(x2)={x1,x2},SC2(x3)={x3},SC2(x4)={x4},SC2(x5)={x5},SC2(x6)={x6,x7},SC2(x7)={x6,x7},SC2(x8)={x8};
第3層.SC3(x1)={x1,x2,x3},SC3(x2)={x1,x2,x3},SC3(x3)={x1,x2,x3,x8},SC3(x4)={x4,x6},SC3(x5)={x5,x6,x7},SC3(x6)={x4,x5,x6,x7},SC3(x7)={x5,x6,x7},SC3(x8)={x3,x8}.
系統(tǒng)有決策類(lèi):
Ud={{x1,x2},{x3,x4,x5,x6,x7,x8}}.
顯然,S1=(U,C1∪syggg00)是協(xié)調(diào)的,所以多粒度決策系統(tǒng)S=(U,C∪syggg00)是協(xié)調(diào)的.S2=(U,C2∪syggg00)是協(xié)調(diào)的,S3=(U,C3∪syggg00)是不協(xié)調(diào)的,所以第2層粒度是全局最優(yōu)粒度.
在不完備多粒度決策系統(tǒng)中,全局最優(yōu)粒度是面向系統(tǒng)的.也就說(shuō),找到全局最優(yōu)粒度是指系統(tǒng)達(dá)到最優(yōu).然而,對(duì)于單個(gè)對(duì)象而言,系統(tǒng)達(dá)到最優(yōu)粒度時(shí)單個(gè)對(duì)象不一定達(dá)到最優(yōu).有的對(duì)象達(dá)到最優(yōu)粒度,有的對(duì)象沒(méi)有達(dá)到最優(yōu)粒度.因此,我們來(lái)討論基于對(duì)象的局部最優(yōu)粒度.
在不完備多粒度決策系統(tǒng)S=(U,C∪syggg00)中,對(duì)于x∈U,給定k(1≤k
例3. 對(duì)于例1中的多粒度決策系統(tǒng)S=(U,C∪syggg00),關(guān)于每個(gè)對(duì)象局部最優(yōu)粒度如下:
SC1(x1)?SC2(x1)?[x1]d,關(guān)于x1的局部最優(yōu)粒度為第2層粒度;
SC1(x2)?SC2(x2)?[x2]d,關(guān)于x2的局部最優(yōu)粒度為第2層粒度;
SC1(x3)?SC2(x3)?[x3]d,關(guān)于x3的局部最優(yōu)粒度為第2層粒度;
SC1(x4)?SC2(x4)?SC3(x4)?[x4]d,關(guān)于x4的局部最優(yōu)粒度為第3層粒度;
SC1(x5)?SC2(x5)?SC3(x5)?[x5]d,關(guān)于x5的局部最優(yōu)粒度為第3層粒度;
SC1(x6)?SC2(x6)?SC3(x6)?[x6]d,關(guān)于x6的局部最優(yōu)粒度為第3層粒度;
SC1(x7)?SC2(x7)?SC3(x7)?[x7]d,關(guān)于x7的局部最優(yōu)粒度為第3層粒度;
SC1(x8)?SC2(x8)?SC3(x8)?[x8]d,關(guān)于x8的局部最優(yōu)粒度為第3層粒度.
顯然,在全局最優(yōu)的第2層粒度上,只有對(duì)象x1,x2和x3達(dá)到最優(yōu)粒度.對(duì)象x4,x5,x6,x7,x8的局部最優(yōu)粒度為第3層粒度.因此,不同的對(duì)象會(huì)在不同的粒度層次上達(dá)到最優(yōu)粒度.
3.2 局部相對(duì)約簡(jiǎn)
屬性約簡(jiǎn)是粗糙集理論中的一個(gè)重要研究?jī)?nèi)容.一般來(lái)說(shuō),屬性約簡(jiǎn)是保持決策系統(tǒng)某種特定性質(zhì)不變條件下的最小屬性集合.通常情況下,不同的限制條件下會(huì)得到不同的約簡(jiǎn)結(jié)果.下面主要討論保持協(xié)調(diào)性不變的條件下基于局部最優(yōu)粒度的屬性約簡(jiǎn).
在不完備多粒度決策系統(tǒng)S=(U,C∪syggg00)中,給定對(duì)象x∈U的局部最優(yōu)粒度層k,存在Bk?Ck,使SBk(x)?[x]d成立,且對(duì)于任意bk∈Bk,SBk-(x)?[x]d不成立,則稱(chēng)Bk是Ck關(guān)于對(duì)象x的局部相對(duì)約簡(jiǎn).
顯然,在不完備多粒度決策系統(tǒng)中,屬性約簡(jiǎn)之前要確定局部最優(yōu)粒度.在找到局部最優(yōu)粒度之后,屬性約簡(jiǎn)的方法與單粒度決策系統(tǒng)一致.
3.3 規(guī)則提取與實(shí)現(xiàn)算法
在粗糙集理論中,隱藏在決策系統(tǒng)中的知識(shí)通常是以規(guī)則的形式被挖掘出來(lái).在不完備多粒度決策系統(tǒng)中,不同的粒度層次會(huì)有不同粒度的決策規(guī)則.為了使規(guī)則具有較好的代表性,首先根據(jù)對(duì)象選擇局部最優(yōu)粒度;然后在保持協(xié)調(diào)性不變的基礎(chǔ)上進(jìn)行屬性約簡(jiǎn);最后把規(guī)則按“條件→決策”的形式從多粒度決策系統(tǒng)中提取出來(lái).
在不完備多粒度決策系統(tǒng)S=(U,C∪syggg00)中,基于局部最優(yōu)粒度的規(guī)則可以分別獲取,具體算法如算法1所示.
算法1. 協(xié)調(diào)系統(tǒng)的局部最優(yōu)粒度規(guī)則挖掘算法.
輸入: 協(xié)調(diào)的不完備多粒度決策系統(tǒng)S=(U,C∪syggg00);
輸出: 局部最優(yōu)粒度下的規(guī)則.
Setopti_level[1,2,…,n]←1;
Fork=2 toIdo
Fori=1 tondo
IfSCk(xi)?[xi]dthen
opti_level[i]←k;
EndIf
EndFor
EndFor
Fork=Ito 1 do
Setno←0;
Fori=1 tondo
Ifopti_level[i]==kthen
no++;
B←Ck;
Forj=1 tomdo
IfSB-{aj}(xi)?[xi]dthen
B←(B-{aj});
EndIf
EndFor
t←?;
For eachaj∈Bdo
t←t∧(aj,aj(xi));
EndFor
opti_level[i]←0;
EndIf
EndFor
EndFor
算法1首先計(jì)算了每個(gè)對(duì)象的最優(yōu)粒度層次,時(shí)間復(fù)雜度為O(nI).但是每次計(jì)算涉及相似類(lèi)的計(jì)算,而每個(gè)相似類(lèi)的計(jì)算需要時(shí)間復(fù)雜度為O(nm),所以時(shí)間復(fù)雜度為O(n2mI).接著根據(jù)每個(gè)對(duì)象的最優(yōu)粒度層次,計(jì)算局部相對(duì)約簡(jiǎn),有3層循環(huán)嵌套,時(shí)間復(fù)雜度為O(nmI),同樣內(nèi)循環(huán)涉及相似類(lèi)的計(jì)算,所以時(shí)間復(fù)雜度為O(n2m2I).所以總的時(shí)間復(fù)雜度為O(n2m2I).
例5. 對(duì)于例1中的不完備多粒度決策系統(tǒng)S=(U,C∪syggg00),根據(jù)算法1,可以得到局部最優(yōu)粒度下的規(guī)則:
顯然,上述規(guī)則的可信度都為1.
4.1 廣義局部最優(yōu)粒度
在不協(xié)調(diào)的不完備多粒度決策系統(tǒng)S=(U,C∪syggg00)中,對(duì)于任意的k(1≤k≤I),Sk=(U,Ck∪syggg00)都是不協(xié)調(diào)的.下面引入對(duì)象x的廣義決策?Ck(x):
?Ck(x)={d(y):y∈SCk(x)},x∈U.
(11)
對(duì)于Sk=(U,Ck∪syggg00),若?x∈U,都有?Ck(x)=?C1(x),則稱(chēng)Sk=(U,Ck∪syggg00)是廣義協(xié)調(diào)的.對(duì)于給定的k(1≤k
同樣,廣義全局最優(yōu)粒度是面向系統(tǒng)的,對(duì)于單個(gè)對(duì)象而言,系統(tǒng)達(dá)到廣義最優(yōu)粒度時(shí)單個(gè)對(duì)象不一定達(dá)到廣義最優(yōu).有的對(duì)象達(dá)到廣義最優(yōu)粒度,有的對(duì)象沒(méi)有達(dá)到廣義最優(yōu)粒度.因此,我們來(lái)討論基于對(duì)象的廣義局部最優(yōu)粒度.
在不完備多粒度決策系統(tǒng)S=(U,C∪syggg00)中,對(duì)于x∈U,給定k(1≤k
例6. 一個(gè)不協(xié)調(diào)的不完備多粒度決策系統(tǒng)S=(U,C∪syggg00),論域U={x1,x2,…,x10},屬性集C={a1,a2,a3,a4},決策屬性d;第1層決策系統(tǒng)S1=(U,C1∪syggg00),如表4所示;第2層決策系統(tǒng)S2=(U,C2∪syggg00),如表5所示;第3層決策系統(tǒng)S3=(U,C3∪syggg00),如表6所示.
Table 4 Decision System of the First Level Granularity表4 第1層粒度的決策系統(tǒng)
Table 5 Decision System of the Second Level Granularity表5 第2層粒度的決策系統(tǒng)
Table 6 Decision System of the Third Level Granularity表6 第3層粒度的決策系統(tǒng)
顯然,?x∈{x1,x2,…,x10},?C2(x)=?C1(x)成立,而?C3(x)=?C1(x)不成立,即S2=(U,C2∪syggg00)是廣義協(xié)調(diào)的,S3=(U,C3∪syggg00)不是廣義協(xié)調(diào)的,所以第2層粒度是廣義全局最優(yōu)粒度.
關(guān)于每個(gè)對(duì)象的廣義局部最優(yōu)粒度如下:
當(dāng)xi∈{x1,x2,…,x8}時(shí),?C3(xi)=?C1(xi)成立,即關(guān)于對(duì)象xi的廣義局部最優(yōu)粒度為第3層粒度;
當(dāng)xi∈{x9,x10}時(shí),?C2(xi)=?C1(xi)成立,而?C3(xi)=?C1(xi)不成立,即關(guān)于對(duì)象xi的廣義局部最優(yōu)粒度為第2層粒度;
顯然,不同的對(duì)象會(huì)在不同的粒度層次上達(dá)到廣義局部最優(yōu)粒度.
4.2 廣義局部相對(duì)約簡(jiǎn)
在不協(xié)調(diào)的不完備多粒度決策系統(tǒng)S=(U,C∪syggg00)中,給定對(duì)象x∈U的廣義局部最優(yōu)粒度層k,存在Bk?Ck,使?Bk(x)=?Ck(x)成立,且對(duì)于任意bk∈Bk,?Bk-(x)=?Ck(x)不成立,則稱(chēng)Bk是Ck關(guān)于對(duì)象x的廣義局部相對(duì)約簡(jiǎn).
4.3 廣義規(guī)則提取與實(shí)現(xiàn)算法
在不協(xié)調(diào)的不完備多粒度決策系統(tǒng)S=(U,C∪syggg00)中,基于廣義局部最優(yōu)粒度的規(guī)則可以分別獲取,具體算法如算法2所示:
算法2. 不協(xié)調(diào)系統(tǒng)的廣義局部最優(yōu)粒度規(guī)則挖掘算法.
輸入: 不協(xié)調(diào)的不完備多粒度決策系統(tǒng)S=(U,C∪syggg00);
輸出: 廣義局部最優(yōu)粒度下的規(guī)則.
Setgopti_level[1,2,…,n]←1;
Fork=2 toIdo
Fori=1 tondo
If ?Ck(xi)=?C1(xi) then
gopti_level[i]←k;
EndIf
EndFor
EndFor
Fork=Ito 1 do
Setno←0;
Fori=1 tondo
Ifgopti_level[i]==kthen
no++;
B←Ck;
Forj=1 tomdo
If ?B-{aj}(xi)=?Ck(xi) then
B←(B-{aj});
EndIf
EndFor
t←?;
For eachaj∈Bdo
t←t∧(aj,aj(xi));
EndFor
For eachdj∈?Ck(xi) do
EndFor
gopti_level[i]←0;
EndIf
EndFor
EndFor
算法2首先計(jì)算了每個(gè)對(duì)象的廣義最優(yōu)粒度層次,兩重循環(huán)的時(shí)間復(fù)雜度為O(nI).但是每次計(jì)算涉及廣義決策的計(jì)算,計(jì)算需要時(shí)間復(fù)雜度為O(nm),所以時(shí)間復(fù)雜度為O(n2mI).接著根據(jù)每個(gè)對(duì)象的廣義最優(yōu)粒度層次,計(jì)算局部相對(duì)約簡(jiǎn),3層循環(huán)嵌套的時(shí)間復(fù)雜度為O(nmI),同樣內(nèi)循環(huán)涉及廣義決策的計(jì)算,時(shí)間復(fù)雜度為O(n2m2I).所以總的時(shí)間復(fù)雜度為O(n2m2I).
例8. 對(duì)于例6中不協(xié)調(diào)的不完備多粒度決策系統(tǒng)S=(U,C∪syggg00),根據(jù)算法2,可以得到廣義局部最優(yōu)粒度下的規(guī)則:
我們對(duì)上述算法用UCI公開(kāi)數(shù)據(jù)集進(jìn)行實(shí)驗(yàn)驗(yàn)證,對(duì)于協(xié)調(diào)的和不協(xié)調(diào)的不完備決策系統(tǒng)的實(shí)驗(yàn)情況分別介紹如下:
實(shí)驗(yàn)1. 協(xié)調(diào)的不完備決策系統(tǒng).
選用的數(shù)據(jù)集是乳腺癌數(shù)據(jù)集,共有699條記錄,除了編號(hào)外有10個(gè)屬性,其中缺省值有16個(gè).我們用映射的方法構(gòu)造了3層粒度,經(jīng)過(guò)驗(yàn)算系統(tǒng)是協(xié)調(diào)的,而且全局最優(yōu)粒度是第2層粒度.經(jīng)過(guò)程序運(yùn)行后刪除重復(fù)的規(guī)則,我們?cè)诘?層粒度上得到109條確定性規(guī)則,在第2層粒度上得到21條確定性規(guī)則.
實(shí)驗(yàn)2. 不協(xié)調(diào)的不完備決策系統(tǒng).
選用的數(shù)據(jù)集是乳房X片數(shù)據(jù)集,共有961條記錄,除了編號(hào)外有6個(gè)屬性,其中缺省值有162個(gè).我們刪除了年齡屬性后用映射的方法構(gòu)造了3層粒度,經(jīng)過(guò)驗(yàn)算系統(tǒng)是不協(xié)調(diào)的.所以引入廣義決策,并且廣義全局最優(yōu)粒度是第1層粒度.經(jīng)過(guò)程序運(yùn)行后刪除重復(fù)的規(guī)則,我們?cè)诘?層粒度上得到10條確定性規(guī)則、34條可能性規(guī)則,在第2層粒度上得到5條確定性規(guī)則、6條可能性規(guī)則,在第1層粒度上得到3條確定性規(guī)則、4條可能性規(guī)則.
在經(jīng)典的粗糙集方法中,每個(gè)對(duì)象在某個(gè)屬性上只能取一個(gè)值.但是人們?cè)趯?shí)際處理問(wèn)題時(shí),可能需要在不同的粒度層次上用不同值來(lái)表示.本文介紹不完備多粒度決策系統(tǒng)的概念,對(duì)協(xié)調(diào)的系統(tǒng)通過(guò)局部最優(yōu)粒度和局部約簡(jiǎn)來(lái)挖掘基于局部最優(yōu)粒度的規(guī)則;對(duì)于不協(xié)調(diào)的系統(tǒng)通過(guò)廣義局部最優(yōu)粒度和廣義局部約簡(jiǎn)來(lái)挖掘基于廣義局部最優(yōu)粒度的規(guī)則;同時(shí)結(jié)合例子分別給出了規(guī)則提取的算法.這種屬性約簡(jiǎn)與規(guī)則提取盡可能在合適的粒度上進(jìn)行,這對(duì)于多粒度決策系統(tǒng)的知識(shí)獲取具有重要意義.在后續(xù)的研究中,將討論多粒度序信息系統(tǒng)中最優(yōu)粒度的選擇問(wèn)題.
[1]Zadeh L A. Fuzzy sets and information granularity[G]Advances in Fuzzy Set Theory and Applications. Amsterdam: North-Holland, 1979: 3-18
[2]Hobbs J R. Granularity[C]Proc of the 9th Int Joint Conf on Artificial Intelligence. San Francisco, CA: Morgan Kaufmann, 1985: 432-435
[3]Lin T Y. Granular computing on binary relations I: Data mining and neighborhood systems[G]Rough Sets in Knowledge Discovery. Heidelberg: Physica-Verlag, 1998: 107-121
[4]Lin T Y. Granular computing: Structures, representations, and applications[G]LNAI 2639: Proc of the 9th Int Conf on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Berlin: Springer, 2003: 16-24
[5]Yao Yiyu. Granular computing: Basic issues and possible solutions[C]Proc of the 5th Joint Conf on Computing and Information. Durham, NC: Duke University Press, 2000: 186-189
[6]Zhang Ling, Zhang Bo. Quotient Space Based Problem Solving: A Theoretical foundation of Granular Computing[M]. Beijing: Tsinghua University Press, 2014 (in Chinese)(張鈴, 張鈸. 基于商空間的問(wèn)題求解: 粒度計(jì)算的理論基礎(chǔ)[M]. 北京: 清華大學(xué)出版社, 2014)
[7]Liang Jiye, Qian Yuhua, Li Deyu, et al. Theory and method of granular computing for big data mining[J]. Scientia Sinica: Informationis, 2015, 45(11): 1355-1369 (in Chinese)(梁吉業(yè), 錢(qián)宇華, 李德玉, 等. 大數(shù)據(jù)挖掘的粒計(jì)算理論與方法[J]. 中國(guó)科學(xué): 信息科學(xué), 2015, 45(11): 1355-1369)
[8]Yao J T, Vasilakos A V, Pedrycz W. Granular computing: Perspectives and challenges[J]. IEEE Trans on Cybernetics, 2013, 43(6): 1977-1989
[9]Liu Qing, Qiu Taorong, Liu Lan. The research of granular computing based on nonstandard analysis[J]. Chinese Journal of Computers, 2015, 38(8): 1618-1627 (in Chinese)(劉清, 邱桃榮, 劉斕. 基于非標(biāo)準(zhǔn)分析的粒計(jì)算研究[J]. 計(jì)算機(jī)學(xué)報(bào), 2015, 38(8): 1618-1627)
[10]Hu Jie, Li Tianrui, Wang Hongjun, et al. Hierarchical cluster ensemble model based on knowledge granulation[J]. Knowledge-Based Systems, 2016, 91: 179-188
[11]Yang Xibei, Qian Yuhua, Yang Jingyu. Hierarchical structures on multigranulation spaces[J]. Journal of Computer Science and Technology, 2012, 27(6): 1169-1183
[12]Xu Ji, Wang Guoyin, Yu Hong. Review of big data processing based on granular computing[J]. Chinese Journal of Computers, 2015, 38(8): 1497-1517 (in Chinese)(徐計(jì), 王國(guó)胤, 于洪. 基于粒計(jì)算的大數(shù)據(jù)處理[J]. 計(jì)算機(jī)學(xué)報(bào), 2015, 38(8): 1497-1517)
[13]Liu Qing, Liu Qun. Granules and applications of granular computing in logical reasoning[J]. Journal of Computer Research and Development, 2004, 41(4): 546-551 (in Chinese)(劉清, 劉群. 粒及粒計(jì)算在邏輯推理中的應(yīng)用[J]. 計(jì)算機(jī)研究與發(fā)展, 2004, 41(4): 546-551)
[14]Hu Qinghua, Yu Daren, Xie Zongxia, et al. Fuzzy probabilistic approximation spaces and their information measures[J]. IEEE Trans on Fuzzy Systems, 2006, 14 (2): 191-201
[15]Zhao Suyun, Wang Xizhao, Chen Degang, et al. Nested structure in parameterized rough reduction[J]. Information Sciences, 2013, 248(6): 130-150
[16]Shao Mingwen, Leung Y, Wu Weizhi. Rule acquisition and complexity reduction in formal decision contexts[J]. International Journal of Approximate Reasoning, 2014, 55(1): 259-274
[17]Zhang Qinghua, Xue Yubin, Wang Guoyin. Optimal approximation sets of rough sets[J]. Journal of Software, 2016, 27(2): 295-308 (in Chinese)(張清華, 薛玉斌, 王國(guó)胤. 粗糙集的最優(yōu)近似集[J]. 軟件學(xué)報(bào), 2016, 27(2): 295-308)
[18]Duan Jie, Hu Qinghua, Zhang Lingjun, et al. Feature selection for multi-label classification based on neighborhood rough sets[J]. Journal of Computer Research and Development, 2015, 52(1): 56-65 (in Chinese)(段潔, 胡清華, 張靈均, 等. 基于鄰域粗糙集的多標(biāo)記分類(lèi)特征選擇算法[J]. 計(jì)算機(jī)研究與發(fā)展, 2015, 52(1): 56-65)
[19]Pawlak Z. Rough sets[J]. International Journal of Computer and Information Science, 1982, 11(5): 341-356
[20]Deng Dayong, Xu Xiaoyu, Huang Houkuan. Concept drifting detection for categorical evolving data based on parallel reducts[J]. Journal of Computer Research and Development, 2015, 52(5): 1071-1079 (in Chinese)(鄧大勇, 徐小玉, 黃厚寬. 基于并行約簡(jiǎn)的概念漂移探測(cè)[J]. 計(jì)算機(jī)研究與發(fā)展, 2015, 52(5): 1071-1079)
[21]Fu Zhiyao, Gao Ling, Sun Qian, et al. Evaluation of vulnerability severity based on rough sets and attributes reduction[J]. Journal of Computer Research and Development, 2016, 53(5): 1009-1017 (in Chinese)(付志耀, 高嶺, 孫騫, 等. 基于粗糙集的漏洞屬性約簡(jiǎn)及嚴(yán)重性評(píng)估[J]. 計(jì)算機(jī)研究與發(fā)展, 2016, 53(5): 1009-1017)
[22]Zhang Wenxiu, Mi Jusheng, Wu Weizhi. Knowledge reductions in inconsistent information systems[J]. Chinese Journal of Computers, 2003, 26(1): 12-18 (in Chinese)(張文修, 米據(jù)生, 吳偉志. 不協(xié)調(diào)目標(biāo)信息系統(tǒng)的知識(shí)約簡(jiǎn)[J]. 計(jì)算機(jī)學(xué)報(bào), 2003, 26(1): 12-18)
[23]Wang Guoyin, Yu Hong, Yang Dachun. Decision table reduction based on conditional information entropy[J]. Chinese Journal of Computers, 2002, 25(7): 759-766 (in Chinese)(王國(guó)胤, 于洪, 楊大春. 基于條件信息熵的決策表約簡(jiǎn)[J]. 計(jì)算機(jī)學(xué)報(bào), 2002, 25(7): 759-766)
[24]Mi Jusheng, Wu Weizhi, Zhang Wenxiu. Approaches to knowledge reduction based on variable precision rough set model[J]. Information Sciences, 2004, 159(34): 255-272
[25]Miao Duoqian, Hu Guirong. A Heuristic algorithm for reduction of knowledge[J]. Journal of Computer Research and Development, 1999, 36(6): 681-684 (in Chinese)(苗奪謙, 胡桂榮. 知識(shí)約簡(jiǎn)的一種啟發(fā)式算法[J]. 計(jì)算機(jī)研究與發(fā)展, 1999, 36(6): 681-684)
[26]Skowron A, Rauszer C. The discernibility matrices and functions in information systems[G]Intelligent Decision Support—Handbook of Applications and Advances of the Rough Sets Theory. Amsterdam: Kluwer Academic Publishers, 1992: 331-362
[27]Qian Yuhua, Liang Jiye, Yao Yiyu, et al. MGRS: A multi-granulation rough set[J]. Information Sciences, 2010, 180 (6): 949-970
[28]Wu Weizhi, Leung Yee. Theory and applications of granular labelled partitions in multi-scale decision tables[J]. Information Sciences, 2011, 181(18): 3878-3897
[29]Wu Weizhi, Leung Yee. Optimal scale selection for multi-scale decision tables[J]. International Journal of Approximate Reasoning, 2013, 54(8): 1107-1129
[30]Wu Weizhi, Chen Ying, Xu Youhong, et al. Optimal granularity selections in consistent incomplete multi-granular labeled decision systems[J]. Pattern Recognition and Artificial Intelligence, 2016, 29(2): 108-115 (in Chinese)(吳偉志, 陳穎, 徐優(yōu)紅, 等. 協(xié)調(diào)的不完備多粒度標(biāo)記決策系統(tǒng)的最優(yōu)粒度選擇[J]. 模式識(shí)別與人工智能, 2016, 29(2): 108-115)
[31]Wu Weizhi, Gao Cangjian, Li Tongjun. Ordered granular labeled structures and rough approximations[J]. Journal of Computer Research and Development, 2014, 51(12): 2623-2632 (in Chinese)(吳偉志, 高倉(cāng)健, 李同軍. 序粒度標(biāo)記結(jié)構(gòu)及其粗糙近似[J]. 計(jì)算機(jī)研究與發(fā)展, 2014, 51(12): 2623-2632)
[32]Gu Shenming, Wu Weizhi. On knowledge acquisition in multi-scale decision systems[J]. International Journal of Machine Learning and Cybernetics, 2013, 4(5): 477-486
[33]Gu Shenming, Wu Weizhi. Knowledge acquisition in inconsistent multi-scale decision systems[G]LNCS 6954: Proc of the 6th Int Conf on Rough Sets and Knowledge Technology. Berlin: Springer, 2011: 669-678
[34]Wu Weizhi. Attribute reduction based on evidence theory in incomplete decision systems[J]. Information Sciences, 2008, 178(5): 1355-1371
[35]Kryszkiewicz M. Rough set approach to incomplete information systems[J]. Information Sciences, 1998, 112(1234): 39-49
[36]Yang Xibei, Song Xiaoning, Chen Zehua, et al. On multigranulation rough sets in incomplete information system[J]. International Journal of Machine Learning and Cybernetics, 2012, 3(3): 223-232
Gu Shenming, born in 1970. Professor. Senior member of CCF. His main research interests include rough set theory, granular computing, concept lattice, machine learning, etc.
Gu Jinyan, born in 1993. Master. Her main research interests include rough set theory, granular computing, artificial intelligence, etc (974795816@qq.com).
Wu Weizhi, born in 1964. Professor and PhD supervisor. Senior member of CCF. His main research interests include rough set theory, granular computing, random set theory, concept lattice, approxing reasoning, etc (wuwz@zjou.edu.cn).
Li Tongjun, born in 1966. PhD and professor. His main research interests include rough set theory, granular computing, concept lattice, etc (litj@zjou.edu.cn).
Chen Chaojun, born in 1992. Master. Her main research interests include rough set theory, granular computing, artificial intelligence, etc (972237554@qq.com)
Local Optimal Granularity Selections in Incomplete Multi-Granular Decision Systems
Gu Shenming, Gu Jinyan, Wu Weizhi, Li Tongjun, and Chen Chaojun
(SchoolofMathematics,PhysicsandInformationScience,ZhejiangOceanUniversity,Zhoushan,Zhejiang316022) (KeyLaboratoryofOceanographicBigDataMining&ApplicationofZhejiangProvince(ZhejiangOceanUniversity),Zhoushan,Zhejiang316022)
Granular computing is an approach for knowledge representing and data mining. With the view point of granular computing, the notion of a granule is interpreted as one of the numerous small particles forming a larger unit. In many real-life applications, there are different granules at different levels of scale in data sets having hierarchical scale structures. Many people apply granular computing for problem solving by considering multiple levels of granularity. This allows us to focus on solving a problem at the most appropriate level of granularity by ignoring unimportant and irrelevant details. In this paper, rule extraction based on the local optimal granularities in incomplete multi-granular decision systems is explored. Firstly, the concept of incomplete multi-granular decision systems is introduced. Then the notions of the optimal granularity and the local optimal granularity in consistent incomplete multi-granular decision system are defined, and the approaches to attribute reduction and rule extraction based on the local optimal granularities are illustrated; the generalized decisions in inconsistent incomplete multi-granular decision systems are further introduced, the generalized optimal granularity and the generalized local optimal granularity are also defined, and the approaches to attribute reduction and rule extraction based on the generalized local optimal granularities are investigated. Finally, the experimental results on the public datasets are discussed.
decision system; granular computing; local optimal granularity; multi-granularity; rule extraction
2016-05-24;
2016-10-10
國(guó)家自然科學(xué)基金項(xiàng)目(61602415,61573321,61272021,41631179);浙江省自然科學(xué)基金項(xiàng)目(LY14F030001);浙江省海洋科學(xué)重中之重學(xué)科開(kāi)放基金項(xiàng)目(20160102) This work was supported by the National Natural Science Foundation of China (61602415, 61573321, 61272021, 41631179), the Natural Science Foundation of Zhejiang Province of China (LY14F030001), and the Open Foundation from Marine Sciences in the Most Important Subjects of Zhejiang Province (20160102).
TP18