唐美麗 胡瓊 馬廷淮
摘 ?要: 語音識(shí)別作為人工智能研究中不可或缺的一部分已經(jīng)逐漸滲透到人們的日常生活中。針對(duì)傳統(tǒng)語音識(shí)別方法不能很好地實(shí)現(xiàn)并識(shí)別復(fù)雜多變、非特定人語音的問題,文中提出利用在時(shí)間序列上關(guān)聯(lián)性較強(qiáng)的循環(huán)神經(jīng)網(wǎng)絡(luò)(RNN)建立語音識(shí)別模型??紤]到語音信號(hào)豐富的時(shí)頻信息表達(dá),在特征提取環(huán)節(jié)進(jìn)行改進(jìn),利用具有較好時(shí)頻分辨率的小波變換(WT)取代快速傅里葉變換(FFT)作為該模型的輸入;然后,采用隨時(shí)間展開的反向傳播算法(BPTT)進(jìn)行特征學(xué)習(xí)與訓(xùn)練。在實(shí)驗(yàn)測(cè)試中,首先,對(duì)比分析了基于小波變換的特征提取對(duì)識(shí)別效果的影響;其次,通過與傳統(tǒng)的HMM模型及BP神經(jīng)網(wǎng)絡(luò)的識(shí)別率做對(duì)比,驗(yàn)證RNN神經(jīng)網(wǎng)絡(luò)可提高語音識(shí)別準(zhǔn)確率和穩(wěn)定性。
關(guān)鍵詞: 語音識(shí)別; 循環(huán)神經(jīng)網(wǎng)絡(luò); 反向傳播算法; 特征提取; 小波變換; HMM模型; BP神經(jīng)網(wǎng)絡(luò)
中圖分類號(hào): TN912?34; TP391.1 ? ? ? ? ? ? ? 文獻(xiàn)標(biāo)識(shí)碼: A ? ? ? ? ? ? ? ? ? ?文章編號(hào): 1004?373X(2019)14?0152?05
Research on speech recognition based on recurrent neural network
TANG Meili, HU Qiong, MA Tinghuai
(Nanjing University of Information Science & Technology, Nanjing 210044, China)
Abstract: Speech recognition as an indispensable part of artificial intelligence research has gradually penetrated into people's daily live. In allusion to the problems that the traditional method of speech recognition can not properly identify the complex and non?specific speech, establishing a speech recognition model based on recurrent neural network (RNN) with strong correlation in time series is propose in this paper. In consideration of the abundant time?frequency information of speech signal, the feature extraction process is improved, in which the wavelet transform (WT) with better time?frequency resolution is used as the input of the model to replace the fast Fourier transform (FFT). The back propagation time algorithm (BPTT) expanding with time is adopted to conduct the feature learning and training. In the experiment test, the contrastive analysis on the influence of the feature extraction based on wavelet transform on recognition effect was carried out, and the recognition rate of the speech recognition model proposed in this paper was compared with that of the traditional HMM model and BP neural network. By the above measures, the RNN neural network is proved that its accuracy of speech recognition rate and the stability of the recognition are improved to a certain extent.
Keywords: speech recognition; recurrent neural network; back propagation algorithm; feature extraction; wavelet transform; HMM model; BP network
0 ?引 ?言
隨著人工智能的迅猛發(fā)展,語音識(shí)別作為人機(jī)交互的樞紐工具而備受人們青睞,而且已經(jīng)初步應(yīng)用于手機(jī)、車載系統(tǒng)、搜索引擎、機(jī)器人、電子商務(wù)等多個(gè)領(lǐng)域。語音識(shí)別在應(yīng)用上的蓬勃發(fā)展使得對(duì)它的研究不斷更新和完善,傳統(tǒng)的模板匹配方法和統(tǒng)計(jì)學(xué)習(xí)方法對(duì)語音識(shí)別而言已趨成熟甚至出現(xiàn)了瓶頸[1],而利用人工神經(jīng)網(wǎng)絡(luò)進(jìn)行語音識(shí)別因其突出效果而方興未艾。利用人工神經(jīng)網(wǎng)絡(luò)對(duì)語音進(jìn)行學(xué)習(xí)與處理的優(yōu)勢(shì)在于神經(jīng)網(wǎng)絡(luò)的工作原理模仿了人腦神經(jīng)元的活動(dòng)機(jī)理,通過各節(jié)點(diǎn)連接形成網(wǎng)絡(luò)結(jié)構(gòu)再輔之以自適應(yīng)算法完成識(shí)別過程。另一方面神經(jīng)網(wǎng)絡(luò)可映射復(fù)雜語音信號(hào)之間的非線性關(guān)系,對(duì)語音序列有強(qiáng)大的學(xué)習(xí)能力[2?3]。語音信號(hào)具有在時(shí)間序列上展開以及包含豐富的時(shí)頻信息兩個(gè)重要特點(diǎn)。傳統(tǒng)聲學(xué)模型雖然分析了各語音音子的內(nèi)部狀態(tài),但忽略了音子與音子之間相互影響的關(guān)系;而常用的人工神經(jīng)網(wǎng)絡(luò)雖然強(qiáng)調(diào)了語言音子之間的聯(lián)系,但內(nèi)部狀態(tài)之間沒有形成全連接而是以層與層的形式連接。鑒于以上方法的缺點(diǎn),本文采用能彌補(bǔ)以上缺陷的循環(huán)神經(jīng)網(wǎng)絡(luò)進(jìn)行語音識(shí)別的研究。