馮立偉 孫立文 顧歡 李元
摘要:針對多維尺度變換(multidimensional scaling,MDS)方法對高維數(shù)據(jù)進(jìn)行維數(shù)約簡時(shí),新樣本缺少映射矩陣無法進(jìn)行低維嵌入的問題,提出了增量式多維尺度變換(incremental multidimensional scaling,IMDS)方法。首先,引入雙重局部近鄰標(biāo)準(zhǔn)化(dual local nearest neighbor standardization,DLNS)技術(shù)以解決IMDS方法降維后數(shù)據(jù)仍然具有多中心、方差差異明顯等問題;其次,采用Hotelling T統(tǒng)計(jì)量對過程進(jìn)行監(jiān)控,組成增量式多維尺度變換和雙重局部近鄰標(biāo)準(zhǔn)化的故障檢測方法(IMDS-DLNS);最后,通過數(shù)值模擬過程和青霉素發(fā)酵過程,將IMDS-DLNS方法分別與PCA,KPCA和FD-KNN等方法作對比分析。結(jié)果表明,IMDS-DLNS對比其他方法有更高的故障檢測率。IMDS-DLNS方法對多變量、多模態(tài)過程具有良好的故障檢測能力,能夠保障產(chǎn)品質(zhì)量和生產(chǎn)的安全性,可為工業(yè)過程故障檢測研究提供參考。
關(guān)鍵詞:自動控制技術(shù)其他學(xué)科;多模態(tài);增量多維尺度變換;雙重局部近鄰標(biāo)準(zhǔn)化;故障檢測
中圖分類號:TP277文獻(xiàn)標(biāo)識碼:A
DOI:10.7535/hbkd.2022yx03007
Industrial process fault detection based on IMDS-DLNS method
FENG Liwei SUN Liwen GU Huan LI Yuan
(1.College of Science,Shenyang University of Chemical Technology,Shenyang,Liaoning 110142,China;2.College of Computer Science and Technology,Shenyang University of Chemical Technology,Shenyang,Liaoning 110142,China;3.Key Laboratory of Intelligent Technology for Chemical Process Industry of Liaoning Province,Shenyang,Liaoning 110142,China)
Abstract:Aiming at the problem that when the multidimensional scaling (MDS) method is used to reduce the dimensionality of high-dimensional data,the new sample lacks the mapping matrix and cannot carry out low-dimensional embedding,an incremental multidimensional scaling (IMDS) method was proposed.Firstly,the dual local nearest neighbor standardization (DLNS) technology was introduced to solve the problem of data having multiple centers and obvious variance differences after IMDS dimensionality reduction.Secondly,Hotelling T statistics was used to monitor the process,and a fault detection method (IMDS-DLNS) with incremental multi-dimensional scale transformation and double local neighbor standardization was constructed.Finally,through numerical simulation of the process and penicillin fermentation process,the IMDS-DLNS method is compared with PCA,KPCA,F(xiàn)D-KNN and other methods,respectively.The results show that IMDS-DLNS has a higher fault detection rate compared to other methods.IMDS-DLNS method has good fault detection capabilities for multivariable and multimodal processes,and can guarantee product quality and production safety,which provides some reference for industrial process fault detection.
Keywords: other disciplines of automatic control technology;multi-modality;incremental multi-dimensional scale transformation;double local nearest neighbor standardization;fault detection
隨著科技的高速發(fā)展,工業(yè)生產(chǎn)規(guī)模與復(fù)雜度也在日益提高,基于過程監(jiān)控的檢測與診斷技術(shù)在保證生產(chǎn)安全方面得到更多關(guān)注與重視。
基于數(shù)據(jù)驅(qū)動的過程監(jiān)控中,主成分分析(principal component analysis,PCA) [1-2]和偏最小二乘 (partial least squares,PLS) [3-4]等方法已經(jīng)得到廣泛應(yīng)用。許多學(xué)者針對此類方法展開了一系列深入研究。XIU等[5]通過引入稀疏項(xiàng)來降低過程噪聲,在魯棒主成分分析(RPCA)目標(biāo)函數(shù)中集成超圖拉普拉斯正則化技術(shù),對PCA方法進(jìn)行擴(kuò)展,構(gòu)建拉普拉斯正則魯棒主成分分析(LRPCA)故障檢測方法,并提出一種有效的乘法器交替方向算法對LRPCA進(jìn)行優(yōu)化,建立了局部收斂模型。趙帥等[6]采用貝葉斯推斷的加權(quán)方法將過程變量和質(zhì)量變量相融合,對包含質(zhì)量變量信息的過程變量進(jìn)行PCA建模,有效提高了故障檢測率。但是當(dāng)數(shù)據(jù)呈現(xiàn)多中心和疏密程度不同的形式時(shí),此類方法在檢測過程中顯現(xiàn)出很大的弊端[7]。
為了解決多中心問題,HE等[8]提出k近鄰方法(fault detection using the k?nearest neighbor rule,F(xiàn)D-KNN),使用樣本的近鄰距離的累積和構(gòu)造統(tǒng)計(jì)量進(jìn)行故障檢測。當(dāng)各模態(tài)的離散程度不同時(shí),F(xiàn)D-KNN將漏報(bào)部分微弱故障[9]。為解決這一問題,GUO等[10]提出了概率密度的KNN多模態(tài)故障檢測方法,使用概率密度來確定新樣本屬于哪個(gè)模態(tài),避免了低離散度模態(tài)的微弱故障被高離散度模態(tài)的正常數(shù)據(jù)淹沒的問題。通過特征提取可以有效消除由于KNN存在多次計(jì)算高維樣本間歐氏距離的高計(jì)算量問題。ZHANG等[11]考慮主成分分析真實(shí)得分和預(yù)估得分的差異性,提出了主成分差分的k近鄰故障檢測方法。該方法通過主成分提取特征,只考慮到樣本的全局信息,忽略了內(nèi)部結(jié)構(gòu)。為了在提取樣本的主要特征時(shí)保持內(nèi)部結(jié)構(gòu),多維尺度變換 (multidimensional scaling,MDS) [12-14]被提出。它與PCA的區(qū)別在于PCA使用協(xié)方差矩陣作為輸入,MDS使用距離矩陣作為輸入,然而MDS對新樣本的低維嵌入缺少映射矩陣,降低了投影效率。
為解決工業(yè)過程數(shù)據(jù)維度高、MDS新樣本低維嵌入困難、多模態(tài)等問題,本文提出了一種基于增量式多維尺度變換和雙重局部近鄰標(biāo)準(zhǔn)化(incremental multidimensional scaling-dual local nearest neighbor standardization,IMDS-DLNS)故障檢測方法。首先,采用IMDS在保持樣本間歐氏距離近似不變的情況下提取數(shù)據(jù)的主要特征;其次,對特征數(shù)據(jù)進(jìn)行雙重近鄰標(biāo)準(zhǔn)化處理使數(shù)據(jù)融為單模態(tài),并使得變量近似服從多元高斯分布;最后,采用統(tǒng)計(jì)量T對過程進(jìn)行監(jiān)控。
1多維尺度變換
2基于增量式多維尺度變換的雙重局部近鄰故障檢測策略(IMDS-DLNS)
為實(shí)現(xiàn)新樣本在線投影,本文引入增量式技術(shù)將MDS改進(jìn)為IMDS方法。使用DLNS對經(jīng)IMDS投影后的數(shù)據(jù)進(jìn)行融合操作,采用Hotelling統(tǒng)計(jì)量T對過程進(jìn)行監(jiān)控。
2.1增量式多維尺度變換
MDS方法是通過計(jì)算訓(xùn)練樣本間的內(nèi)積矩陣實(shí)現(xiàn)向低維空間的投影。但該方法只能將高維空間內(nèi)全體樣本視為整體向低維空間進(jìn)行投影,缺少映射矩陣,導(dǎo)致對新樣本無法進(jìn)行直接投影。當(dāng)對新樣本點(diǎn)進(jìn)行低維投影時(shí),需要將新樣本與已訓(xùn)練樣本合在一起進(jìn)行重新建模,顯著增加了系統(tǒng)負(fù)擔(dān)。故本節(jié)提出增量式多維尺度變換,實(shí)現(xiàn)對新樣本的投影。
2.2雙重局部近鄰標(biāo)準(zhǔn)化
雙重局部近鄰標(biāo)準(zhǔn)化是通過尋找樣本的2層近鄰對樣本進(jìn)行標(biāo)準(zhǔn)化,是多模態(tài)中有效的數(shù)據(jù)處理策略,能夠解決樣本近鄰跨越2個(gè)模態(tài)時(shí)的問題,并將多模態(tài)數(shù)據(jù)轉(zhuǎn)換為單模態(tài)[15-16]。
2.3IMDS-DLNS方法
IMDS方法單獨(dú)計(jì)算新樣本的低維映射,避免了訓(xùn)練樣本重復(fù)計(jì)算的問題。雖然該方法可以優(yōu)化數(shù)據(jù)復(fù)雜度、減少計(jì)算量,但是多模態(tài)、方差不同的數(shù)據(jù)經(jīng)過IMDS方法處理后,數(shù)據(jù)特征仍呈現(xiàn)多模態(tài)且方差不同,不滿足統(tǒng)計(jì)量T的假設(shè)前提條件。因此,采用IMDS與DLNS方法相結(jié)合,消除模態(tài)間因方差不同產(chǎn)生的差異性,調(diào)整各模態(tài)數(shù)據(jù)的疏密度程度,為后續(xù)統(tǒng)計(jì)量的計(jì)算奠定良好的基礎(chǔ)。本文采用霍特林[18]T統(tǒng)計(jì)量對過程進(jìn)行監(jiān)控,實(shí)現(xiàn)故障檢測。
3實(shí)例模擬
本文采用一個(gè)方差差異顯著的多模態(tài)數(shù)值模擬過程和青霉素發(fā)酵過程,以比較本文所提方法與PCA,KPCA,F(xiàn)D-KNN方法的檢測結(jié)果,驗(yàn)證IMDS-DLNS的有效性。
3.1數(shù)值模擬過程
圖2為上述4種方法檢測對比圖。PCA的檢測結(jié)果如圖2 a)所示,前800個(gè)表示訓(xùn)練數(shù)據(jù)分布情況,經(jīng)過PCA處理后的數(shù)據(jù)仍然具有多模態(tài)特征,但是其統(tǒng)計(jì)量T要求數(shù)據(jù)服從單峰高斯分布,因此,故障點(diǎn)在主元空間內(nèi)全未檢測出。圖2 b)為KPCA故障檢測圖,對多模態(tài)數(shù)據(jù)檢測效果不佳。主要原因是KPCA的核映射并未將故障點(diǎn)與正常樣本分離,處理后故障數(shù)據(jù)全部落入主元空間中。圖2 c)為FD-KNN故障檢測圖,故障未被檢測出。其主要原因?yàn)樽鳛槿謾z測方法的FD-KNN,方差較大模態(tài)的樣本分布決定了控制限。本節(jié)生成的階躍故障數(shù)據(jù)是在密集模態(tài)引入,因此,故障皆處在控制限下方。圖2 d)為IMDS-DLNS的故障檢測圖,多模態(tài)過程中的故障點(diǎn)均被有效檢測出。IMDS提取了樣本點(diǎn)之間的內(nèi)部信息,DLNS方法弱化了2個(gè)模態(tài)間的差異性,從而使故障數(shù)據(jù)被有效檢測。圖3為IMDS-DLNS處理后的樣本分布,從圖中可以看出原始2個(gè)模態(tài)數(shù)據(jù)融合成一個(gè)單模態(tài)數(shù)據(jù),數(shù)據(jù)服從單峰高斯分布。
3.2青霉素發(fā)酵過程
青霉素作為治療敏感菌的首選抗生素藥品,其發(fā)酵過程分為2個(gè)階段[20-21]:
1)底物消耗(0~43 h),青霉菌開始繁殖生長,為后期青霉素產(chǎn)生做前期準(zhǔn)備;
2)青霉素合成(44 h~結(jié)束),青霉菌開始合成青霉素,為促進(jìn)產(chǎn)物生成,需要不斷向容器內(nèi)補(bǔ)充物料。
采用Pensim 仿真平臺[22]進(jìn)行發(fā)酵模擬,獲得一批正常數(shù)據(jù)用于訓(xùn)練建模,其中反應(yīng)時(shí)間設(shè)為400 h,采樣時(shí)間設(shè)為0.5 h,其余參數(shù)使用系統(tǒng)默認(rèn)值。
故障分為2種類型,即階躍故障和斜坡故障,每種類型分別生成2組,其中故障f1:在10~40 h內(nèi),在通風(fēng)率上引入-0.25%幅值的階躍故障;故障f2:100~200 h,在變量通風(fēng)率上引入0.05(L/h)幅值的斜坡故障;故障f3:在150~300 h內(nèi),對變量攪拌功率引入5%幅值的階躍故障;故障f4:20~80 h內(nèi),在變量攪拌率上引入-1(W)幅值的斜坡故障。
青霉素過程共有18個(gè)變量,本文選擇對過程具有重要影響的12個(gè)變量,作為監(jiān)控變量,如表1所示。
為驗(yàn)證IMDS-DLNS方法處理后青霉素?cái)?shù)據(jù)近似服從高斯分布,對每個(gè)變量繪制正態(tài)性檢驗(yàn)分位數(shù)-分位數(shù)圖(quantile-quantile plot,QQ圖)。圖4為第1個(gè)變量的QQ圖,此時(shí)處理后數(shù)據(jù)的變量散點(diǎn)分布近似是一條直線,因此,該變量近似服從高斯分布。
表2為采用IMDS-DLNS,PCA,KPCA和FD-KNN方法對青霉素發(fā)酵過程進(jìn)行故障檢測的結(jié)果。表2中PCA和KPCA對故障f1,f2和f4的檢測率較低。這是因?yàn)榍嗝顾匕l(fā)酵過程為多模態(tài)過程,不符合統(tǒng)計(jì)量T的假設(shè)前提條件。故障f3偏離幅度大,故障點(diǎn)明顯偏離正常樣本,因此,PCA和KPCA能夠有效檢測出故障f3。FD-KNN對青霉素發(fā)酵過程中故障f3的檢測率為100%,而其余故障的檢測率較低。主要原因?yàn)榍嗝顾匕l(fā)酵過程中的模態(tài)間疏密度不同,此時(shí)控制限由稀疏模態(tài)的數(shù)據(jù)所決定,因此,其他故障檢測率低。
圖5為IMDS-DLNS對f1批次的故障檢測圖。前800個(gè)數(shù)據(jù)為訓(xùn)練模型所用的正常數(shù)據(jù),可看出IMDS-DLNS方法將多模態(tài)的青霉素?cái)?shù)據(jù)處理為單模態(tài)數(shù)據(jù),提取到青霉素?cái)?shù)據(jù)的主要特征。因此,IMDS-DLNS方法能夠檢測出此過程中的大部分故障數(shù)據(jù)。
4結(jié)語
針對PCA,KPCA等傳統(tǒng)方法對多模態(tài)過程進(jìn)行故障檢測時(shí)存在的故障漏報(bào)和正常數(shù)據(jù)誤報(bào)的問題,提出了基于IMDS-DLNS的故障檢測方法。理論分析和實(shí)驗(yàn)結(jié)果均表明,本文方法解決了MDS對新樣本無法映射的問題,實(shí)現(xiàn)了對多中心和方差差異顯著的多模態(tài)過程的故障檢測,相較于傳統(tǒng)方法具有更高的檢測效率,對工業(yè)發(fā)展以及生產(chǎn)安全管理具有參考價(jià)值。
本文方法需要計(jì)算樣本間的距離,當(dāng)樣本量變大時(shí),算法的運(yùn)行時(shí)間增加,監(jiān)控成本升高。未來將對IMDS-DLNS方法進(jìn)行優(yōu)化以提高計(jì)算效率。
參考文獻(xiàn)/References:
[1]CAO L J,CHUA K S,CHONG W K,et al.A comparison of PCA,KPCA and ICA for dimensionality reduction in support vector machine[J].Neurocomputing,2003,55(1/2):321-336.
[2]YUE H H,QIN S J.Reconstruction-based fault identification using a combined index[J].Industrial & Engineering Chemistry Research,2001,40(20):4403-4414.
[3]孔祥玉,李強(qiáng),安秋生,等.基于偏最小二乘得分重構(gòu)的質(zhì)量相關(guān)故障檢測[J].控制理論與應(yīng)用,2020,37(11):2321-2332.KONG Xiangyu,LI Qiang,AN Qiusheng,et al.Quality-related fault detection based on the score reconstruction associated with partial least squares[J].Control Theory & Applications,2020,37(11):2321-2332.
[4]HENSELER J,RINGLE C M,SARSTEDT M.Testing measurement invariance of composites using partial least squares[J].International Marketing Review,2016,33(3):405-431.
[5]XIU X C,YANG Y,KONG L C,et al.Laplacian regularized robust principal component analysis for process monitoring[J].Journal of Process Control,2020,92:212-219.
[6]趙帥,宋冰,侍洪波.基于加權(quán)互信息主元分析算法的質(zhì)量相關(guān)故障檢測[J].化工學(xué)報(bào),2018,69(3):962-973.ZHAO Shuai,SONG Bing,SHI Hongbo.Quality-related fault detection based on weighted mutual information principal component analysis[J].CIESC Jorunal,2018,69(3):962-973.
[7]鄧佳偉,鄧曉剛,曹玉蘋,等.基于加權(quán)統(tǒng)計(jì)局部核主元分析的非線性化工過程微小故障診斷方法[J].化工學(xué)報(bào),2019,70(7):2594-2605.DENG Jiawei,DENG Xiaogang,CAO Yuping,et al.Incipient fault diagnosis method of nonlinear chemical process based on weighted statistical local KPCA[J].CIESC Jorunal,2019,70(7):2594-2605.
[8]HE Q P,WANG J.Fault detection using the k-nearest neighbor rule for semiconductor manufacturing processes[J].IEEE Transactions on Semiconductor Manufacturing,2007,20(4):345-354.
[9]VERDIER G,F(xiàn)ERREIRA A.Adaptive mahalanobis distance and k-nearest neighbor rule for fault detection in semiconductor manufacturing[J].IEEE Transactions on Semiconductor Manufacturing,2011,24(1):59-68.
[10]GUO J Y,WANG X,LI Y.kNN based on probability density for fault detection in multimodal processes[J].Journal of Chemometrics,2018,32(7).DOI:10.1002/cem.3021.
[11]ZHANG C,GUO Q X,LI Y.Fault detection in the Tennessee Eastman benchmark process using principal component difference based onk-nearest neighbors[J].IEEE Access,2020,8:49999-50009.
[12]SAEED N,NAM H,HAQ M I U,et al.A survey on multidimensional scaling[J].ACM Computing Surveys,2019,51(3):1-25.
[13]GOWER J C.Some distance properties of latent root and vector methods used in multivariate analysis[J].Biometrika,1966,53(3/4):325-338.
[14]COX F,COX M A A.Multidimensional scaling[J].Journal of the Royal Statistical Society:Series A(Statistics in Society),1996,159(1):184-185.
[15]馮立偉,張成,李元,等.基于改進(jìn)的局部近鄰標(biāo)準(zhǔn)化和kNN的多階段過程故障檢測[J].計(jì)算機(jī)應(yīng)用,2018,38(7):2130-2135.FENG Liwei,ZHANG Cheng,LI Yuan,et al.Fault detection for multistage process based on improved local neighborhood standardization and kNN[J].Journal of Computer Applications,2018,38(7):2130-2135.
[16]MA H H,HU Y,SHI H B.A novel local neighborhood standardization strategy and its application in fault detection of multimode processes[J].Chemometrics and Intelligent Laboratory Systems,2012,118:287-300.
[17]馬賀賀.基于數(shù)據(jù)驅(qū)動的復(fù)雜工業(yè)過程故障檢測方法研究[D].上海:華東理工大學(xué),2013.MA Hehe.Fault Detection of Complex Industrial Processes Based on Data-driven Methods[D].Shanghai:East China University of Science and Technology,2013.
[18]VALLE S,LI W H,QIN S J.Selection of thenumber of principal components:The variance of the reconstruction error criterion with a comparison to other methods[J].Industrial & Engineering Chemistry Research,1999,38(11):4389-4401.
[19]馮雄峰,陽憲惠,徐用懋.多元統(tǒng)計(jì)過程控制方法的平方預(yù)測誤差分析[J].清華大學(xué)學(xué)報(bào)(自然科學(xué)版),1999,39(7):41-45.FENG Xiongfeng,YANG Xianhui,XU Yongmao.Squared prediction error analysis of multivariate statistical process control[J].Journal of Tsinghua University(Science and Technology),1999,39(7):41-45.
[20]ABBASI M A,KHAN A Q,MUSTAFA G,et al.Data-driven fault diagnostics for industrial processes:An application to penicillin fermentation process[J].IEEE Access,2021,9:65977-65987.
[21]ZHU J L,WANG Y Q,ZHOU D H,et al.Batch process modeling and monitoring with local outlier factor[J].IEEE Transactions on Control Systems Technology,2019,27(4):1552-1565.
[22]LIBOTTE G B,LOBATO F S,PLATT G M,et al.Robust multi-objective singular optimal control of penicillin fermentation process[J].Global Journal of Researches in Engineering,2020,20(3):1-9.