雷志彬 陳駿霖
?
基于全連接LSTM的心肺音分離方法
雷志彬 陳駿霖
(廣東工業(yè)大學(xué))
心肺音分離;非負(fù)矩陣分解;長(zhǎng)短時(shí)記憶網(wǎng)絡(luò)
據(jù)世界衛(wèi)生組織統(tǒng)計(jì),2015年全球約有2160萬(wàn)(約55%)的非傳染性疾病死亡與心肺疾病有關(guān)[1]。心肺疾病已成為嚴(yán)重威脅人類生命健康的主要疾病之一,而對(duì)心肺系統(tǒng)快速準(zhǔn)確的診斷可及早發(fā)現(xiàn)病情,幫助患者早日康復(fù)。臨床對(duì)心肺系統(tǒng)診斷的方式有心電圖、胸透和聽(tīng)診等。相較于其他診斷方式,聽(tīng)診因具有快速、非侵入和低成本的優(yōu)點(diǎn)而被廣泛應(yīng)用于心肺系統(tǒng)的早期診斷。然而,通過(guò)聽(tīng)診器采集到的心音和肺音常混疊在一起,給臨床診斷帶來(lái)干擾。因此,從心肺音混合信號(hào)中分離出干凈的心音和肺音,對(duì)提高聽(tīng)診質(zhì)量和協(xié)助臨床診斷具有重要的現(xiàn)實(shí)意義。
由于心音和肺音在60 Hz~320 Hz頻帶存在相互干擾,傳統(tǒng)的帶通濾波[2]無(wú)法將它們完全分離。為解決這一問(wèn)題,國(guó)內(nèi)外學(xué)者提出許多方法,包括基于自適應(yīng)濾波的方法[3-5]、基于小波濾波的方法[6-8]和盲源分離方法[9-12]等。
深度學(xué)習(xí)方法因具有出色的挖掘輸入與目標(biāo)之間非線性映射關(guān)系的能力而受到關(guān)注,廣泛應(yīng)用于語(yǔ)音識(shí)別[13]、語(yǔ)音增強(qiáng)[14]和語(yǔ)音分離[15]等領(lǐng)域。目前尚未有人采用深度學(xué)習(xí)方法來(lái)實(shí)現(xiàn)心肺音分離。長(zhǎng)短時(shí)記憶(long short time memory,LSTM)神經(jīng)網(wǎng)絡(luò)具有學(xué)習(xí)輸入數(shù)據(jù)時(shí)序相關(guān)性的能力,與心肺音信號(hào)的特性相符。本文提出一種基于LSTM的深度神經(jīng)網(wǎng)絡(luò)分離心肺音,同時(shí)為減少網(wǎng)絡(luò)優(yōu)化參數(shù),提高網(wǎng)絡(luò)收斂速度,在LSTM層中采用全連接網(wǎng)絡(luò)結(jié)構(gòu)。
在安靜環(huán)境下,通過(guò)電子聽(tīng)診器采集到的心肺音混合信號(hào)可用以下線性混合模型表示[12]:
隨著電子技術(shù)的發(fā)展,通過(guò)電子聽(tīng)診器采集的信號(hào)中白噪聲很弱,且可以通過(guò)預(yù)處理方法[16]去除。文獻(xiàn)[12]認(rèn)為預(yù)處理后的心肺音混合信號(hào)只含有心音和肺音,可用以下數(shù)學(xué)模型表示:
本文提出的基于全連接LSTM的心肺音分離模型如圖1所示。其中,心肺音混合信號(hào)的時(shí)頻譜作為特征輸入到全連接LSTM神經(jīng)網(wǎng)絡(luò),網(wǎng)絡(luò)輸出心音和肺音的時(shí)頻掩碼。心音和肺音的時(shí)頻掩碼與混合信號(hào)的時(shí)頻譜做點(diǎn)乘,從而獲得分離出來(lái)的心音和肺音時(shí)頻譜。分離的心音和肺音時(shí)頻譜與原始真實(shí)的心音和肺音時(shí)頻譜做比較,得到心音和肺音的分離誤差項(xiàng),這2個(gè)誤差項(xiàng)的和即為全連接LSTM神經(jīng)網(wǎng)絡(luò)的代價(jià)函數(shù)。
圖1 基于全連接LSTM 的心肺音分離模型
在傳統(tǒng)的神經(jīng)網(wǎng)絡(luò)中,用LSTM記憶塊替換傳統(tǒng)神經(jīng)元組成LSTM網(wǎng)絡(luò)。LSTM記憶塊由1個(gè)記憶單元和3個(gè)門控單元組成,其結(jié)構(gòu)如圖2所示。記憶單元的狀態(tài)受3個(gè)門控單元(輸入門、遺忘門和輸出門)的控制。輸入門將當(dāng)前數(shù)據(jù)選擇性地輸入到記憶單元;遺忘門調(diào)控歷史信息對(duì)當(dāng)前記憶單元狀態(tài)值的影響;輸出門用于選擇性地輸出記憶單元的狀態(tài)值。3個(gè)門控單元和獨(dú)立記憶單元的設(shè)計(jì),使得LSTM網(wǎng)絡(luò)具有學(xué)習(xí)輸入信號(hào)時(shí)序相關(guān)性的能力,并且避免了網(wǎng)絡(luò)訓(xùn)練過(guò)程中梯度消失和梯度爆炸的問(wèn)題。
圖2 LSTM記憶塊[17]
在時(shí)刻,LSTM記憶塊的狀態(tài)通過(guò)以下3個(gè)步驟進(jìn)行更新。
3)輸出信息。由輸出門控制信息輸出,用Sigmoid函數(shù)決定要輸出的記憶單元狀態(tài)信息;用tanh函數(shù)處理記憶單元狀態(tài),2部分信息相乘得到輸出值。
本文提出的基于全連接LSTM的神經(jīng)網(wǎng)絡(luò)框圖如圖3所示。
圖3 基于全連接LSTM的神經(jīng)網(wǎng)絡(luò)框圖
心肺音混合信號(hào)的時(shí)頻譜作為網(wǎng)絡(luò)的輸入,經(jīng)過(guò)一個(gè)全連接層的映射后,輸出到一個(gè)具有3層全連接結(jié)構(gòu)的LSTM網(wǎng)絡(luò)中;全連接LSTM網(wǎng)絡(luò)的輸出再經(jīng)過(guò)一個(gè)全連接層的映射輸出心音和肺音的時(shí)頻掩碼。這里的全連接是指當(dāng)前層的輸入和輸出拼接起來(lái)作為下一層的輸入。通過(guò)全連接結(jié)構(gòu),可減少網(wǎng)絡(luò)參數(shù),加快網(wǎng)絡(luò)收斂速度[18]。
在語(yǔ)音分離中,時(shí)頻掩碼常用作估計(jì)源信號(hào)時(shí)頻譜的中間媒介。本文采用時(shí)頻掩碼為相位敏感掩碼[19],定義為
基于全連接LSTM神經(jīng)網(wǎng)絡(luò)的代價(jià)函數(shù)[20]為
在一個(gè)自構(gòu)的心肺音數(shù)據(jù)集上評(píng)估本文提出的基于全連接LSTM的心肺音分離方法性能。該數(shù)據(jù)集來(lái)源于公開(kāi)數(shù)據(jù)集[21-28],含有111條干凈的心音信號(hào)和36條干凈的肺音信號(hào)。由于從不同的公開(kāi)數(shù)據(jù)集中篩選出來(lái)的信號(hào)具有不同的信號(hào)長(zhǎng)度和采樣頻率,本文通過(guò)截取和降采樣將信號(hào)調(diào)整為10 s,2 kHz采樣頻率。仿真實(shí)驗(yàn)的輸入信號(hào)為0 dB信噪比的心肺音混合信號(hào),其合成方法為將干凈的心音和肺音信號(hào)按1:1能量比線性混合。通過(guò)計(jì)算分離得到的心音和肺音信號(hào)SNR來(lái)評(píng)估心肺音分離性能。SNR越高,表示心肺音分離性能越好。
實(shí)驗(yàn)中,STFT采用窗長(zhǎng)為128個(gè)采樣點(diǎn)的漢寧窗,窗口的移動(dòng)步長(zhǎng)為32個(gè)采樣點(diǎn)。心肺音混合信號(hào)經(jīng)過(guò)STFT后得到的時(shí)頻譜一幀一幀地輸入到基于全連接LSTM的心肺音分離網(wǎng)絡(luò)中,每一幀含有129個(gè)頻率特征。網(wǎng)絡(luò)第一層為含有64個(gè)神經(jīng)元的全連接層;接著的3層全連接LSTM層分別含有64,128和256個(gè)神經(jīng)元;最后一層為含有129個(gè)神經(jīng)元的全連接層,網(wǎng)絡(luò)輸出心音和肺音每一幀的時(shí)頻掩碼。
本文使用三重交叉驗(yàn)證評(píng)估網(wǎng)絡(luò)的心肺音分離性能。將111條心音信號(hào)和36條肺音信號(hào)平均分成3份,在每一重交叉驗(yàn)證中,其中1份用于測(cè)試,另2份用于訓(xùn)練,最后通過(guò)平均三重交叉驗(yàn)證結(jié)果獲得最終的評(píng)估結(jié)果。為比較線性與非線性方法的心肺音分離性能,實(shí)驗(yàn)中監(jiān)督非負(fù)矩陣分解法(SNMF)采用作基線方法。為了檢驗(yàn)利用時(shí)序信息是否能夠加強(qiáng)心肺音分離效果,本文設(shè)計(jì)了一個(gè)基于全連接的神經(jīng)網(wǎng)絡(luò)(F-NN),用全連接層替代F-LSTM網(wǎng)絡(luò)中的LSTM層,并保持其他模型參數(shù)與F-LSTM網(wǎng)絡(luò)一致。
表1比較了F-LSTM,F(xiàn)-NN和SNMF的心肺音分離性能。與SNMF相比,F(xiàn)-NN和F-LSTM具有更優(yōu)的心肺音分離性能。F-LSTM分離出來(lái)的心音和肺音信噪比分別提高了1.69 dB和1.73 dB,驗(yàn)證了在心肺音分離中,非線性心肺音分離模型優(yōu)于線性模型。與F-NN相比,F(xiàn)-LSTM分離出來(lái)的心音和肺音信噪比分別提高了0.67 dB和0.76 dB,驗(yàn)證了利用時(shí)序信息能夠進(jìn)一步加強(qiáng)心肺音分離效果。綜上所述,本文提出的F-LSTM網(wǎng)絡(luò)取得了較優(yōu)的心肺音分離性能。
表1 實(shí)驗(yàn)結(jié)果對(duì)比
本文將LSTM網(wǎng)絡(luò)應(yīng)用于心肺音分離,以處理心肺音成分的非線性混疊情況,并捕捉心肺音成分的時(shí)序相關(guān)性,加強(qiáng)分離效果。為了減少網(wǎng)絡(luò)參數(shù),提高訓(xùn)練收斂速度,文中LSTM網(wǎng)絡(luò)采用了全連接網(wǎng)絡(luò)結(jié)構(gòu)。在公開(kāi)心肺音數(shù)據(jù)集上的仿真結(jié)果表明:相較于線性心肺音分離模型,非線性分離模型取得更優(yōu)的分離效果,并且利用心肺音成分的時(shí)序相關(guān)性能進(jìn)一步提高心肺音分離的性能。
[1] World Health Statistics 2017: monitoring health for the SDGs, Sustainable Development Goals[R]. World Health Organization, Tech. Rep., 2017.
[2] Gavriely N, Palti Y, Alroy G. Spectral characteristics of normal breath sounds[J]. Journal of Applied Physiology, 1981,50(2): 307–314.
[3] Sathesh K, Muniraj N J R. Real time heart and lung sound separation using adaptive line enhancer with NLMS[J]. Journal of Theoretical and Applied Information Technology, 2014, 65(2):559–564.
[4] Ruban Nersisson, Mathew M Noel. Heart sound and lung sound separation algorithms: a review[J]. Journal of Medical Engineering and Technology, 2017,41(1):13–21.
[5] Ruban Nersisson, Mathew M. Noel. Hybrid Nelder-Mead search based optimal least mean square algorithms for heart and lung sound separation[J]. Engineering Science and Technology, 2017,20:1054–1065.
[6] Liu Feng, Wang Yutai, Wang Yanxiang. Research and implementation of heart sound denoising[J]. Physics Procedia, 2012,25:777–785.
[7] Gradolewski D, Redlarski G. Wavelet-based denoising method for real phonocardiography signal recorded by mobile devices in noisy environment[J]. Computers in Biology and Medicine, 2014,52:119–129.
[8] Mondal A, Saxena I, Tang H, et al. A noise reduction technique based on nonlinear kernel function for heart sound analysis[J]. IEEE Journal of Biomedical and Health Informatics, 2017, doi:10.1109/JBHI.2017.2667685.
[9] Chien J C, Huang M C, Lin Y D, et al. A study of heart sound and lung sound separation by independent component analysis technique[D]. in 28thAnnual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS’06). IEEE, 2006:5708–5711.
[10] Ayari F, Ksouri M, Alouani A T. Lung sound extraction from mixed lung and heart sounds FASTICA algorithm[D]. in 16th IEEE Mediterranean Electrotechnical Conference (MELECON). IEEE, 2012:339–342.
[11] Makkiabadi B, Jarchi D, Sanei S. A new time domain convolutive BSS of heart and lung sounds[D]. in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012:605–608.
[12] Shah G, Koch P, Papadias C B. On the blind recovery of cardiac and respiratory sounds[J]. IEEE Journal of Biomedical and Health Informatics, 2015,19(1):151–157.
[13] 張舸,張鵬遠(yuǎn),潘接林,等.基于遞歸神經(jīng)網(wǎng)絡(luò)的語(yǔ)音識(shí)別快速解碼算法[J]. 電子與信息學(xué)報(bào), 2017, 39(4): 930-937.
[14] 韓偉,張雄偉,閔剛,等.基于感知掩蔽深度神經(jīng)網(wǎng)絡(luò)的單通道語(yǔ)音增強(qiáng)方法[J].自動(dòng)化學(xué)報(bào),2017,43(2):248-258.
[15] 王燕南.基于深度學(xué)習(xí)的說(shuō)話人無(wú)關(guān)單通道語(yǔ)音分離[D]. 合肥:中國(guó)科學(xué)技術(shù)大學(xué),2017.
[16] Chen P Y, Selesnick I W. Translation-invariant shrinkage/ thresholding of group sparse signals[J]. Signal Process, 2014,94:476–489.
[17] 任智慧,徐浩煜,封松林,等.基于LSTM網(wǎng)絡(luò)的序列標(biāo)注中文分詞法[J].計(jì)算機(jī)應(yīng)用研究,2017,34(5):1321-1324,1341.
[18] Huang G, Liu Z, Maaten L V D, et al. Densely connected convolutional networks[J]. IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2017: 2261–2269.
[19] Erdogan H, Hershey J R, Watanabe S, et al. Deep recurrent networks for separation and recognition of single channel speech in non-stationary background audio[J]. New Era for Robust Speech Recognition: Exploiting Deep Learning. Berlin, Germany: Springer, 2017.
[20] Kolbaek M , Yu D , Tan Z H , et al. Multi-talker speech separation with utterance-level permutation invariant training of deep recurrent neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017,25 (10): 1901–1913.
[21] Bentley P, Nordehn G, Coimbra M, et al. The PASCAL Classifying heart sounds challenge[DB/OL]. (2011-11-01) [2018-12-20]. http://www.peterjbentley.com/heartchallenge.
[22] PhysioNet. Classification of normal/abnormal heart sound recordings: the physionet/computing in cardiology challenge 2016[DB/OL]. (2018-08-13) [2018-12-20]. https://physionet. org/challenge/2016.
[23] Welch Allyn. Student clinical learning[DB/OL]. (2018-12-20) [2018-12-20].https://www.welchallyn.com/content/welchallyn/ americas/en/students.html.
[24] Easy Auscultation. Heart and lung sounds reference guide[DB/OL]. (2018-12-20) [2018-12-20]. https://www. easyauscultation.com/heart-sounds.
[25] Open Michigan. Heart sound and murmur library[DB/OL]. (2015-04-14) [2018-12-20]. https://open.umich.edu/find/open- educational-resources/medical/heart-sound-murmur-library.
[26] East Tennessee State University. Pulmonary breath sounds[DB/OL]. (2002-11-25) [2018-12-20]. http://faculty. etsu.edu/arnall/www/public_html/heartlung/breathsounds/contents.html.
[27] Medical Training and Simulation LLC. Breath sounds reference guide[DB/OL]. (2018-12-20) [2018-12-20]. https://www.practicalclinicalskills.com/breath-sounds-reference-guide.
[28] PixSoft. The R.A.L.E. Repository[DB/OL]. (2018-12-20) [2018-12-20]. http://www.rale.ca.
Cardiorespiratory Sound Separation Method Based on Fully Connected Long Short-Time Memory Network
Lei Zhibin Chen Junlin
(Guangdong University of Technology)
In order to address the interference of cardiac sounds and respiratory sounds, researchers proposed a method for separating the cardiorespiratory sound based on non-negative matrix factorization (NMF). However, this method assumes that the cardiac sounds and respiratory sounds are mixed in a linear method in the time-frequency domain, and it didn’t utilize the temporal correlation of cardiorespiratory sounds. Therefore, in this paper we applied the long short-time memory (LSTM) network to the separation of cardiorespiratory sounds to deal with the nonlinear mixing of cardiorespiratory sounds and capture the temporal correlation to enhance the separation performances. In order to reduce the network parameters and improve speed of training, LSTM were constructed in a fully connected structure. The experimental results show that the LSTM network achieves better performances of cardiorespiratory sound separation than the supervised NMF method.
Cardiorespiratory Sound Separation; Non-Negative Matrix Factorization; Long Short-Time Memory Network
雷志彬,男,1994年生,碩士研究生,主要研究領(lǐng)域:模式識(shí)別,機(jī)器學(xué)習(xí),生物信號(hào)處理。E-mail:leizhibin_er@163.com