康錦濤,王 莉,王曉笛,盛 卉,李敬陽(yáng),黃文林
(公安部物證鑒定中心,2011計(jì)劃司法文明協(xié)同創(chuàng)新中心,北京 100038)
司法語(yǔ)音及聲學(xué)在我國(guó)即為廣義上的聲紋鑒定,包括司法語(yǔ)音學(xué)檢驗(yàn)中的語(yǔ)音同一認(rèn)定、語(yǔ)音人身分析、語(yǔ)音內(nèi)容辨識(shí)和司法聲學(xué)檢驗(yàn)中的錄音的真實(shí)性檢驗(yàn)、降噪及語(yǔ)音增強(qiáng)、噪聲分析、音源同一鑒定以及錄音器材鑒定等內(nèi)容[1]。國(guó)外司法語(yǔ)音及聲學(xué)的研究?jī)?nèi)容與我國(guó)大致相同[2]。2017年,語(yǔ)音同一認(rèn)定仍是司法語(yǔ)音及聲學(xué)的主要內(nèi)容,其在聽(tīng)覺(jué)分析、語(yǔ)音學(xué)-聲學(xué)分析、自動(dòng)識(shí)別、質(zhì)量控制等方面均產(chǎn)生了新的成果;語(yǔ)音人身分析除傳統(tǒng)的性別、年齡等特征外,語(yǔ)音情感分析也成為重要內(nèi)容,并在自動(dòng)識(shí)別方面發(fā)展迅速;各國(guó)學(xué)者也在錄音的真實(shí)性檢驗(yàn)以及降噪及語(yǔ)音增強(qiáng)等方向做了開(kāi)拓。本文對(duì)2017年司法語(yǔ)音及聲學(xué)領(lǐng)域的語(yǔ)音同一認(rèn)定、語(yǔ)音人身分析、錄音的真實(shí)性檢驗(yàn)、降噪及語(yǔ)音增強(qiáng)等熱點(diǎn)專業(yè)的代表性成果進(jìn)行介紹。
語(yǔ)音同一認(rèn)定在我國(guó)即為狹義上的聲紋鑒定[3],它也是司法語(yǔ)音及聲學(xué)實(shí)踐中的主要分支[4]。目前,國(guó)際上的語(yǔ)音同一認(rèn)定實(shí)踐中,絕大多數(shù)機(jī)構(gòu)與從業(yè)者采用的是聽(tīng)覺(jué)分析與聲學(xué)分析相結(jié)合的專家鑒定方法[5],但也有一些機(jī)構(gòu)開(kāi)始嘗試將自動(dòng)識(shí)別的方法引入語(yǔ)音同一認(rèn)定領(lǐng)域,采用半自動(dòng)(專家干預(yù))或自動(dòng)識(shí)別等方法開(kāi)展實(shí)踐[6-7]。2017年,關(guān)于語(yǔ)音同一認(rèn)定的專業(yè)論述多數(shù)集中在聽(tīng)覺(jué)分析方法、語(yǔ)音學(xué)及聲學(xué)特征分析、語(yǔ)音特征的鑒定價(jià)值、鑒定意見(jiàn)表述、自動(dòng)識(shí)別技術(shù)以及語(yǔ)音同一認(rèn)定過(guò)程中的質(zhì)量控制與標(biāo)準(zhǔn)化等方面。
聽(tīng)覺(jué)分析是目前語(yǔ)音同一認(rèn)定技術(shù)方法的重要組成部分[1,8-10],在國(guó)內(nèi)外許多規(guī)范標(biāo)準(zhǔn)中早有明確規(guī)定[11-16]。2017年,Sundqvist等[17]設(shè)計(jì)了一套聽(tīng)覺(jué)分析程序,并將之應(yīng)用于瑞典國(guó)家法庭科學(xué)中心(NFC)的檢驗(yàn)實(shí)踐中。為了推進(jìn)聽(tīng)覺(jué)分析方法的體系化與規(guī)范化,Lindh等[18]對(duì)聽(tīng)覺(jué)分析方法的可靠性做了考察,分別使用聽(tīng)覺(jué)分析與自動(dòng)識(shí)別對(duì)芬蘭語(yǔ)說(shuō)話人進(jìn)行對(duì)比分析,并用于芬蘭國(guó)家調(diào)查局(NBI)的語(yǔ)音同一認(rèn)定實(shí)踐的流程改進(jìn)。Leinonen等[19]提出建立不同語(yǔ)種的聽(tīng)覺(jué)特征集,并在瑞典語(yǔ)和芬蘭語(yǔ)兩個(gè)語(yǔ)種上開(kāi)始了初步嘗試。Land等[20]對(duì)笑聲的聽(tīng)覺(jué)分析價(jià)值進(jìn)行了探討。在偽裝語(yǔ)音的研究方面,Skarnitzl與 R??i?ková等[21-22]研究了捷克語(yǔ)說(shuō)話人的常見(jiàn)偽裝方式,并對(duì)不同偽造方式下的聽(tīng)覺(jué)特征與聲學(xué)特征做了初步分析,Delvaux等[23]考察了偽裝與模仿兩種方式下聽(tīng)覺(jué)特征與聲學(xué)特征的差異。
嗓音特質(zhì)分析(Vocal Prof i le Analysis, VPA)在語(yǔ)音同一認(rèn)定中的應(yīng)用是近年來(lái)聽(tīng)覺(jué)分析研究的熱點(diǎn)[24-29],2017年,許多專家學(xué)者繼續(xù)就這一方向進(jìn)行探索。為了便于分析,Segundo等[30]設(shè)計(jì)了簡(jiǎn)化的VPA分析表,并應(yīng)用于同卵雙胞胎的聽(tīng)覺(jué)分析上;Segundo等[31]驗(yàn)證了VPA分析表在西班牙語(yǔ)、德語(yǔ)、英語(yǔ)語(yǔ)境下的有效性。Klug[32]就VPA分析表的改進(jìn)做了探討,提出應(yīng)當(dāng)在加強(qiáng)培訓(xùn)的基礎(chǔ)上改進(jìn)要素的類目。Hughes等[33-34]將VPA分析表得分與自動(dòng)識(shí)別方法結(jié)合起來(lái)考察,結(jié)果表明,將使用梅爾頻率倒譜系數(shù)(MFCC)參數(shù)與長(zhǎng)時(shí)共振峰分布(LTFD)特征的自動(dòng)識(shí)別系統(tǒng)融合,系統(tǒng)性能提升有限,將VPA得分結(jié)果加入后,系統(tǒng)識(shí)別正確率顯著增加。
聽(tīng)覺(jué)分析與語(yǔ)音學(xué)-聲學(xué)分析是共生互補(bǔ)的關(guān)系[35-36],語(yǔ)音學(xué)-聲學(xué)分析方法不僅為聽(tīng)覺(jué)分析提供量化支持,而且也可以提供新的特征[3]。在語(yǔ)音學(xué)-聲學(xué)分析方面,Heuven、Gold等[37-38]繼續(xù)就填詞暫停(f i lled pauses)、猶豫詞(hesitation markers)的聲學(xué)特征進(jìn)行分析,以進(jìn)一步挖掘其在語(yǔ)音同一認(rèn)定中的價(jià)值。He等[39]研究了不同說(shuō)話人的重音變化受噪音或不同頻段影響的程度,結(jié)果表明不同說(shuō)話人的重音特征在全頻段上都有較好的體現(xiàn)。雙語(yǔ)者在說(shuō)兩種語(yǔ)言時(shí)的聲學(xué)特征各有何特點(diǎn)是一直以來(lái)的研究課題之一,Dorreen等[40]就這個(gè)課題下的長(zhǎng)時(shí)基頻分布做了研究。Arantes等[41]考察了語(yǔ)種、話語(yǔ)方式等因素對(duì)長(zhǎng)時(shí)基頻達(dá)到穩(wěn)定狀態(tài)時(shí)的時(shí)長(zhǎng)影響,結(jié)果表明話語(yǔ)方式的影響最大。Dimos、Lopez等[42-43]研究了大喊狀態(tài)下語(yǔ)音的節(jié)奏、韻律以及頻譜特征。He等[44]研究了音強(qiáng)曲線的聲紋鑒定價(jià)值。不同語(yǔ)種的元音空間(vowel space)并不相同,Varo?anec-?kari?[45]研究了克羅地亞語(yǔ)、塞爾維亞語(yǔ)和斯洛文尼亞語(yǔ)男性說(shuō)話人元音空間的異同,為開(kāi)展不同語(yǔ)種間的說(shuō)話人鑒定提供了一定基礎(chǔ)。McDougall等[46]比較了基于音節(jié)與基于時(shí)間的兩種流利度描寫方法。Wang等[47]研究了漢語(yǔ)復(fù)合元音的動(dòng)態(tài)特征,結(jié)果表明復(fù)合元音也具備較高的聲紋鑒定價(jià)值。Heeren[48]對(duì)電話錄音中[s]在不同語(yǔ)境下的不同聲學(xué)特性進(jìn)行了探討。在嗓音檔案(voice prof i le)的構(gòu)建方面,F(xiàn)ranchini[49]以[l]音的聲學(xué)特征為例對(duì)此做了研究,F(xiàn)ingerling[50]對(duì)二語(yǔ)說(shuō)話人的元音集合重建做了探索。
在語(yǔ)音同一認(rèn)定中,語(yǔ)音特征價(jià)值的高低是需要重點(diǎn)考慮的內(nèi)容。根據(jù)語(yǔ)音特征的動(dòng)態(tài)性原理,其具有變異性(即同一說(shuō)話人的自身的差異)和差異性(即不同說(shuō)話人之間的差異),變異小而差異大的特征鑒定價(jià)值較高。2017年,對(duì)于特征價(jià)值的關(guān)注點(diǎn)主要在人群的語(yǔ)音特征分布上。Rhodes等[51]認(rèn)為現(xiàn)階段的人群特征分布研究應(yīng)與實(shí)際案件結(jié)合。Hughes、Wormald[52]提出建立維基方言庫(kù)的構(gòu)想,將方言中的高價(jià)值特征放入數(shù)據(jù)庫(kù)。Hughes等[53]提出了研究人群語(yǔ)音特征分布需要考慮的四個(gè)問(wèn)題,一是控制因子,二是特異度,三是誤差,四是確定程度,并以英語(yǔ)中雙元音[ai]中的共振峰走勢(shì)為例,說(shuō)明了不同情況下的語(yǔ)音特征分布對(duì)語(yǔ)音同一認(rèn)定結(jié)果的可能影響。在檢材與樣本內(nèi)部語(yǔ)音特征的表現(xiàn)是否穩(wěn)定方面,在以往部分研究的基礎(chǔ)上,Ajili[54-56]提出一種使用信息論中的同質(zhì)化度量(homogeneity measure)標(biāo)準(zhǔn)對(duì)聲學(xué)參數(shù)的穩(wěn)定性進(jìn)行度量的方法[57]。
聲紋鑒定的意見(jiàn)表述一直以來(lái)都是討論熱點(diǎn)。國(guó)際上,Rose和Morrison一直提倡量化的似然比體系,英國(guó)的Nolan等絕大部分從業(yè)人員使用英國(guó)立場(chǎng)聲明形式,歐洲大陸的大部分從業(yè)者則使用可能性等級(jí)形式。我國(guó)則多使用5級(jí)分類的可能性等級(jí)形式[11]。
2017年,英國(guó)的French[58]調(diào)整了其意見(jiàn)表述形式,逐漸從英國(guó)立場(chǎng)說(shuō)明框架下的一致性與獨(dú)特性[59]轉(zhuǎn)向可能性等級(jí)形式,在這一框架下,意見(jiàn)共分為13級(jí),與英國(guó)法庭科學(xué)提供者協(xié)會(huì)(Association of Forensic Science Providers)推薦的標(biāo)準(zhǔn)[60]一致。荷蘭NFI的Vermeulen[61]介紹了其得出“強(qiáng)烈支持”結(jié)論的依據(jù),在實(shí)際案例中,NFI只有在檢材與樣本特征幾乎相同或者說(shuō)話人有言語(yǔ)障礙等高度獨(dú)特性特征時(shí)才給出這種鑒定意見(jiàn)。
目前,國(guó)際上司法語(yǔ)音及聲學(xué)專門的語(yǔ)音數(shù)據(jù)庫(kù)有英國(guó)的Nolan建立的DyVis[62]、澳大利亞的Morrison建立的FVCD[63]、西班牙的Ramos建立的AHUMADA[64]、荷蘭的Vloed建立的NFI-FRITS[65]、法國(guó)的Ajili建立的FABIOLE[66]等。國(guó)內(nèi)方面,我國(guó)的“全國(guó)公安機(jī)關(guān)聲紋數(shù)據(jù)庫(kù)”依然是國(guó)際上收錄說(shuō)話人最多的聲紋鑒定語(yǔ)音數(shù)據(jù)庫(kù)。2017年新建的VoxCeleb[67]則是比較新的代表。目前說(shuō)話人自動(dòng)識(shí)別技術(shù)的主流框架主要有兩類,一種是高斯混合模型加通用背景模型(GMM-UBM),另一種是基于i向量(i-vector)空間的概率線性判別分析(PLDA)方法,同時(shí)開(kāi)始使用深度神經(jīng)網(wǎng)絡(luò)(deep neural network,DNN)提取語(yǔ)音特征。后一種框架較新,因此成為2017年的研究熱點(diǎn)。DNN提取語(yǔ)音特征的方法取得的效果較好,對(duì)訓(xùn)練數(shù)據(jù)量的要求也較大,我國(guó)的“全國(guó)公安機(jī)關(guān)聲紋數(shù)據(jù)庫(kù)”已經(jīng)采用DNN方法提取特征。Park等[68]將嗓音音質(zhì)聲學(xué)特征引入采用這種架構(gòu)的自動(dòng)識(shí)別系統(tǒng)中,與MFCC特征結(jié)合,顯著提升了短語(yǔ)音的識(shí)別率。Solewicz等[69]為解決現(xiàn)有的對(duì)數(shù)似然比(LLR)對(duì)處理說(shuō)話人內(nèi)部變異的不足提出了一種新的說(shuō)話人自動(dòng)識(shí)別系統(tǒng)性能指標(biāo)——空假設(shè)對(duì)數(shù)似然比(Null-Hypothesis LLR)。Tsch?pe等[70]考察了基于i向量系統(tǒng)的錯(cuò)誤結(jié)果,發(fā)現(xiàn)如果加入地域信息,系統(tǒng)錯(cuò)誤率會(huì)大大下降。Alexander等[71]設(shè)計(jì)了基于i向量的多說(shuō)話人自動(dòng)識(shí)別系統(tǒng)。Milo?evi?[72]將基頻、共振峰頻率、共振峰帶寬等音段特征(SF)與現(xiàn)有GMM-MFCC架構(gòu)的自動(dòng)識(shí)別系統(tǒng)相結(jié)合,提升了原有系統(tǒng)的識(shí)別正確率。
關(guān)于說(shuō)話人自動(dòng)識(shí)別在語(yǔ)音同一認(rèn)定中的作用,目前仍有爭(zhēng)議。比如,雖然德國(guó)、西班牙、瑞典等國(guó)的訴訟中已有接受專家干預(yù)自動(dòng)識(shí)別方法鑒定結(jié)論的判例,但鑒于目前自動(dòng)識(shí)別系統(tǒng)的性能,這種“接受”不僅在程度上有限,而且推廣起來(lái)仍困難重重。以英國(guó)為例,英國(guó)JP French實(shí)驗(yàn)室的French與Harrison作為辯方專家證人在“女王訴斯雷德等人”(R v Slade&Ors)的上訴案件中提供了專家鑒定與自動(dòng)識(shí)別系統(tǒng)兩套語(yǔ)音同一認(rèn)定證據(jù),但是上訴法院駁回了自動(dòng)識(shí)別系統(tǒng)的鑒定結(jié)論。 French[58]表示,雖然這宗判例并沒(méi)有直接扼殺英國(guó)未來(lái)使用自動(dòng)識(shí)別系統(tǒng)鑒定結(jié)論的希望,但是,鑒于英美法系的判例傳統(tǒng),除非未來(lái)說(shuō)話人自動(dòng)識(shí)別技術(shù)取得重大技術(shù)突破,否則不僅是英國(guó),甚至包括加拿大、新西蘭、澳大利亞等英聯(lián)邦國(guó)家(共52個(gè)國(guó)家)都將駁回說(shuō)話人自動(dòng)識(shí)別系統(tǒng)的鑒定結(jié)論。
質(zhì)量控制方面,F(xiàn)rench等[73]提出了聲紋鑒定實(shí)驗(yàn)室檢驗(yàn)鑒定的透明化倡議,其將之稱為“打開(kāi)百葉窗”(opening the blinds)行動(dòng),并詳細(xì)介紹了JP French實(shí)驗(yàn)室的檢驗(yàn)流程。德國(guó)BKA的Wagner[74]則介紹了其語(yǔ)音同一認(rèn)定的標(biāo)準(zhǔn)操作規(guī)程,并結(jié)合實(shí)際案例進(jìn)行了演示。這種透明化與標(biāo)準(zhǔn)化的趨勢(shì)是司法語(yǔ)音及聲學(xué)中質(zhì)量控制的主要方向。
標(biāo)準(zhǔn)化方面,我國(guó)的公安部頒布了司法語(yǔ)音及聲學(xué)的四個(gè)公安安全行業(yè)標(biāo)準(zhǔn),包括語(yǔ)音同一認(rèn)定[11]、錄音的真實(shí)性檢驗(yàn)[12]、降噪及語(yǔ)音增強(qiáng)[13]和語(yǔ)音人身分析[14]四個(gè)專業(yè)方向。
語(yǔ)音人身分析是指在只聞其聲、不見(jiàn)其人的情況下,對(duì)說(shuō)話人的社會(huì)群體屬性和個(gè)體屬性進(jìn)行刻畫(huà);或在見(jiàn)其人但不知其身份的情況下,通過(guò)上述綜合分析對(duì)其社會(huì)群體屬性進(jìn)行判斷[4]。聲紋鑒定實(shí)踐中,還涉及對(duì)說(shuō)話人的暫時(shí)狀態(tài)與瞬時(shí)狀態(tài)的分析刻畫(huà),如通過(guò)語(yǔ)音對(duì)說(shuō)話人是否抽煙、吸毒進(jìn)行分析,通過(guò)語(yǔ)音推測(cè)說(shuō)話人心理狀態(tài)語(yǔ)音情感分析[75],我們也將之歸入語(yǔ)音人身分析中去。
人工耳蝸的頻率響應(yīng)有自己的特點(diǎn),Kova?i?[76]研究了人工耳蝸對(duì)聲音信號(hào)的處理特性,并探索其在說(shuō)話人性別、體型、身分識(shí)別等方面的應(yīng)用潛力。Georg[77]研究了德語(yǔ)的不同方言對(duì)年齡分析的影響,探索了不同方言對(duì)年齡推測(cè)的影響因素。Tomi?[78]研究了通過(guò)口音負(fù)遷移推斷說(shuō)話人地域的方法。Jong-Lendle等[79]研究了從外國(guó)人的德語(yǔ)口音中推斷其母語(yǔ)的方法。Schwab等[80]研究了抽煙對(duì)嗓音的影響,Rodmonga等[81]研究了吸毒后的言語(yǔ)聽(tīng)覺(jué)特征,其結(jié)果均可用于對(duì)說(shuō)話人身體狀態(tài)的分析。自動(dòng)人身分析方面,Kelly等[82]設(shè)計(jì)了基于i向量的說(shuō)話人自動(dòng)刻畫(huà)系統(tǒng),能夠自動(dòng)分析說(shuō)話人的性別、年齡及語(yǔ)種。Watt等[83]對(duì)自動(dòng)口音識(shí)別與人工口音識(shí)別進(jìn)行了比對(duì)研究。
語(yǔ)音情感分析方面,Kathiresan等[84]研究了MFCC中的語(yǔ)音情感信息。Hippey等[85]探索了在語(yǔ)音中識(shí)別懊悔情緒的方法。Bizozzero等[86]研究了女性說(shuō)話人聲音中的恐懼信息,主要涉及基頻、語(yǔ)速以及音高對(duì)恐懼信息的影響。Satt等設(shè)計(jì)了一種使用卷積網(wǎng)絡(luò)與遞歸網(wǎng)絡(luò)兩種神經(jīng)網(wǎng)絡(luò)工[87]具直接從聲譜圖中識(shí)別情感的方法。Zhang等[88]針對(duì)對(duì)話語(yǔ)音設(shè)計(jì)了一個(gè)情感交流與轉(zhuǎn)換(EIT)模型挖掘?qū)υ捴械慕涣髋c轉(zhuǎn)換語(yǔ)中的情感信息,設(shè)計(jì)的算法比傳統(tǒng)方法在正確率與精度方面各提升了18.8%與22.6%。Parthasarathy、Le等[89-90]對(duì)深度學(xué)習(xí)中的多任務(wù)學(xué)習(xí)方法在語(yǔ)音情感識(shí)別中的應(yīng)用做了探索。除了一般性的情感識(shí)別外,語(yǔ)音測(cè)謊也是語(yǔ)音情感識(shí)別的研究熱點(diǎn)。Schroder[91]使用合成分析方法(analysis-bysynthesis)將不同的發(fā)聲方式、語(yǔ)速、顫音(tremolo)及基頻與中性言語(yǔ)(neutral utterances)組合,分別判斷各段語(yǔ)音的可信度。結(jié)果表明,當(dāng)顫音與氣息增加時(shí),語(yǔ)音內(nèi)容的可信度大大提升,當(dāng)暫停與基頻增加上,語(yǔ)音內(nèi)容的可信度則下降。Mendels[92]使用CXD語(yǔ)料庫(kù)比較了頻譜集合、聲學(xué)-韻律集合和用詞特征集合對(duì)于謊言的表征程度,并使用混合深度模型對(duì)這些集合進(jìn)行測(cè)試。
錄音真實(shí)性檢驗(yàn)是指通過(guò)對(duì)錄音資料進(jìn)行語(yǔ)音學(xué)和聲學(xué)、電磁學(xué)、信號(hào)處理技術(shù)等方面的分析檢驗(yàn),做出其是否經(jīng)過(guò)剪輯的結(jié)論[4]。
2017年,Ali[93]等開(kāi)發(fā)了一套自動(dòng)系統(tǒng),系統(tǒng)基于心理聲學(xué)原理,準(zhǔn)確率達(dá)99.2%。Catalin等[94]為了解決檢驗(yàn)中無(wú)法獲取原始錄音器材的問(wèn)題,將18年間的125中錄音設(shè)備與40中商業(yè)錄音軟件的文件結(jié)構(gòu)與格式做了全面介紹。Jeff等[95]研究了iOS系統(tǒng)中的音頻文件,并基于決策樹(shù)建立了針對(duì)此類文件的檢驗(yàn)流程。Rashmika等[96]探討了錄音中的混響等噪音信息在真實(shí)性檢驗(yàn)中的價(jià)值。
電網(wǎng)頻率(ENF)檢測(cè)方法是錄音的真實(shí)性檢驗(yàn)中的熱點(diǎn)。關(guān)于這一方法的原理與具體內(nèi)容,可參見(jiàn)以往文獻(xiàn)[97-99]。Huang等[100]就ENF檢驗(yàn)中的一些常見(jiàn)問(wèn)題進(jìn)行了討論。James等[101]開(kāi)發(fā)了基于云端的便攜式ENF系統(tǒng),從而避免了檢驗(yàn)的地域限制。Huang等[102]提出用絕對(duì)誤差圖(absolute error map)聯(lián)系檢材音頻與ENF數(shù)據(jù)庫(kù)中的ENF信息,并據(jù)此構(gòu)建的兩套算法。Reis等[103]開(kāi)發(fā)了基于ESPRITHilbert檢測(cè)ENF的分析方法,結(jié)果大大優(yōu)于其他方法。
國(guó)內(nèi)方面,操文成[104]針對(duì)語(yǔ)音偽造的檢測(cè)提出了兩種新算法,漏檢率均低于10%。孫蒙蒙[105]提出了適用于音頻檢測(cè)的共生向量特征,基于該特征的方法準(zhǔn)確率可達(dá)95%。申小虎[106]等在系統(tǒng)分析數(shù)字音頻文件篡改方法基本原理的基礎(chǔ)上,使用多種頻譜分析方法尋找音頻文件的篡改特征,建立了有效的頻譜檢驗(yàn)的方法。
降噪及語(yǔ)音增強(qiáng)是綜合運(yùn)用計(jì)算機(jī)技術(shù)、聲學(xué)技術(shù)對(duì)錄音資料進(jìn)行降低噪音信號(hào)、增強(qiáng)語(yǔ)音信號(hào)的處理技術(shù),目前主要的算法有自適應(yīng)噪聲抵消算法、統(tǒng)計(jì)模型算法、譜減法、聽(tīng)覺(jué)掩蔽算法,短時(shí)譜估計(jì)算法、子空間算法、小波變換算法等[4]。2017年,使用DNN方法降噪及語(yǔ)音增強(qiáng)成為熱點(diǎn)。
在去混響及回聲消除方面,Guzewich等[107]研究了使用DNN去混響的一種新方法。此前,相關(guān)研究[108-111]已經(jīng)在使用DNN去混響方面取得了一定進(jìn)展,新方法處理的音頻在說(shuō)話人比對(duì)系統(tǒng)中的等錯(cuò)誤率由9.2%降至6.8%。Bulling等[112]提出了一種消除錄音中回聲的新方法,可以使信號(hào)的最大穩(wěn)定增益(MSG)提升30分貝。在語(yǔ)音增強(qiáng)方面,Wu等[113]提出了基于局部線性嵌入(LLE)算法的差異補(bǔ)償后置濾波(post-f i ltering)方法。Ogawa等[114]從基于深度神經(jīng)網(wǎng)絡(luò)的聲學(xué)模型(DNN-AM)中提取出瓶頸特征(bottleneck features),然后使用噪音樣例搜索(example search)的方法消除單聲道音頻中的高度不穩(wěn)定噪音。Gelderblom等[115]提出了一種評(píng)價(jià)基于DNN的語(yǔ)音增強(qiáng)算法的主觀評(píng)測(cè)方法。在非DNN方法上,Qian等[116]使用貝葉斯WaveNet方法直接就原始音頻進(jìn)行處理,也得到了不錯(cuò)的語(yǔ)音增強(qiáng)效果。在降噪方面,Pascual等[117]使用深度網(wǎng)絡(luò)中的生成式對(duì)抗網(wǎng)絡(luò)(generative adversarial network)降噪,并以主觀與客觀兩種評(píng)測(cè)方法證明了這種方法的有效性。Maiti等[118]同時(shí)使用兩個(gè)網(wǎng)絡(luò)進(jìn)行拼接再合成(concatenative resynthesis),大大提升了處理速度。值得注意的是,在司法實(shí)踐中,背景噪音因?yàn)榘杏眯畔?,需要在降噪過(guò)程中保留甚至增強(qiáng),這就需要實(shí)踐中結(jié)合多種方法,消減目標(biāo)噪音,保留有用信息,上述部分深度學(xué)習(xí)的方法因具有較強(qiáng)的靈活性便具有了更大的優(yōu)勢(shì)。
[1] 李敬陽(yáng).音像物證技術(shù) 第二章 : 聲音物證技術(shù)[M]//李學(xué)軍.新編物證技術(shù)學(xué). 北京:北京交通大學(xué)出版社,2015:339-360.
[2] HOLLIEN H.The acoustics of crime: the new science of forensic phonetics[M].New York: Plenum Press, 1990.
[3] 曹洪林,李敬陽(yáng),王英利,等.論聲紋鑒定意見(jiàn)的表述形式[J].證據(jù)科學(xué),2013,21(5):605-624.
[4] 王英利,李敬陽(yáng),曹洪林.聲紋鑒定技術(shù)綜述[J].警察技術(shù),2012(4):54-56.
[5] Eriksson A. Aural/Acoustic vs. automatic methods in forensic phonetic casework[M]// NEUSTEIN A, PATIL H.A. In Forensic Speaker Recognition: Law Enforcement and Counter-Terrorism. New York: Springer, 2011: 41-69.
[6] GOLD E, FRENCH P. International practices in forensic speaker comparison[J]. International Journal of Speech Language and the Law, 2011, 18(2): 293-307.
[7] MORRISON G S, SAHITO F H, JARDINE G, et al. Interpol survey of the use of speaker identif i cation by law enforcement agencies[J]. Forensic Science International, 2016, 263(3): 92-100.
[8] NOLAN F. The phonetic bases of speaker recognition[M]. Cambridge, UK: Cambridge University Press, 1983.
[9] HOLLIEN H, DIDLA G, HARNSBERGER J D, et al. The case for aural perceptual speaker identif i cation[J]. Forensic Science International, 2016, 269(3) :8-20.
[10] ROSE P. Forensic speaker identif i cation[M]. London: Taylor and Francis, 2002.
[11] 中華人民共和國(guó)公安部,法庭科學(xué)語(yǔ)音同一認(rèn)定技術(shù)規(guī)范:GA/T 1433-2017[S].北京 :中國(guó)標(biāo)準(zhǔn)出版社,2017.
[12] 中華人民共和國(guó)公安部,法庭錄音的真實(shí)性檢驗(yàn)技術(shù)規(guī)范:GA/T 1432-2017 [S]. 北京:中國(guó)標(biāo)準(zhǔn)出版社,2017.
[13] 中華人民共和國(guó)公安部,法庭科學(xué)降噪及語(yǔ)音增強(qiáng)技術(shù)規(guī)范:GA/T 1431-2017[S] .北京:中國(guó)標(biāo)準(zhǔn)出版社,2017.
[14] 中華人民共和國(guó)公安部,法庭科學(xué)語(yǔ)音人身分析技術(shù)規(guī)范:GA/T 1430-2017[S] .北京 :中國(guó)質(zhì)檢出版社,2017.
[15] 中華人民共和國(guó)司法部司法鑒定管理局,錄音資料鑒定規(guī)范 :SF/Z JD0301001-2010[S].北京:中國(guó)標(biāo)準(zhǔn)出版社,2010.
[16] CAIN S. American Board of Recorded Evidence-Voice Comparison Standards[EB/OL]. (1998)[ 2017-10-15]. http://www.forensictapeanalysisinc.com/Articles/voice_comp.htm
[17] SUNDQVIST M, LEINONEN T, LINDH J, et al. Blind test procedure to avoid bias in perceptual analysis for forensic speaker comparison casework[C]// IAFPA . Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017: 45-47.
[18] LINDH J, NAUTSCH A, LEINONEN T, et al. Comparison between perceptual and automatic systems on fi nnish phone speech data (FinEval1) - a pilot test using score simulations[C]// IAFPA.Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017:86-87.
[19] LEINONEN T, LINDH J, AKESSON J. Creating linguistic feature set templates for perceptual forensic speaker comparison in fi nnish and swedish[C]// IAFPA. Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017:126-128.
[20] LAND E, GOLD E. Speaker identif i cation using laughter in a close social network[C] // IAFPA. Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017: 99-101.
[21] SKARNITZL R, R??I?KOVá A. The malleability of speech production: An examination of sophisticated voice disguise[C] // IAFPA. Proceedings of IAFPA2017. Split,Croatia:IAFPA,2017:59-60.
[22] R??I?KOVá A, SKARNITZL R. Voice disguise strategies in Czech male speakers[J]. AUC Philologica, Phonetica Pragensia.2017.
[23] DELVAUX V, CAUCHETEUX L, HUET K, et al. Voice disguise vs. Impersonation: Acoustic and perceptual measurements of vocal flexibility in non-experts[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:3777-3781.
[24] JESSEN M. Speaker-specif i c information in voice quality parameters[J], Forensic Linguistics 1997, 4 (1):84-103.
[25] K?STER O, K?STER J P. The auditory-perceptual evaluation of voice quality in forensic speaker recognition[J]. The Phonetician, 2004,89: 9–37.
[26] NOLAN F. Voice quality and forensic speaker identif i cation[J].GOVOR XXIV 2007. 24(2):111-128.
[27] K?STER O, JESSEN M , KHAIRI F, et al. Auditory-perceptual identif i cation of voice quality by expert and non-expert listeners[C]. ICphS XVI, 2007:1845-1848.
[28] SEGUNDO E, ALVES H , TRINIDAD M F. CIVIL corpus:voice quality for speaker forensic comparison[J]. Proceida, Social and Behavioral Science. 2013,95(4): 587-593.
[29] FRENCH P. Developing the vocal prof i le analysis scheme for forensic voice comparison[C]. York, UK:IAFPA, 2016.
[30] SEGUNDO E. A simplif i ed vocal prof i le analysis protocol for the assessment of voice quality and speaker similarity[J]. Journal of Voice. 2017,31(5):11-27.
[31] SEGUNDO E, BRAUN A, HUGHES V, et al. Speaker-similarity perception of Spanish twins and non-twins by native speakers of Spanish, German and English[C]// IAFPA. Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017:159-162.
[32] KLUG K. Refining the Vocal Profile Analysis (VPA) scheme for forensic purposes[C]// IAFPA. Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017. 190-191.
[33] HUGHES V, HARRISON P , FOULKES P, et al. Mapping across feature spaces in forensic voice comparison: the contribution of auditory-based voice quality to (semi-)automatic system testing[C]// ISCA. Proceedings of Interspeech2017. Stockholm,Sweden: ISCA , 2017:3892-3896.
[34] HUGHES V, HARRISON P, P FOULKES, et al. The complementarity of automatic, semi-automatic, and phonetic measures of vocal tract output in forensic voice comparison[C]. // IAFPA.Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017:83-85.
[35] NOLAN F. Speaker identification evidence: its forms, limitations, and roles[C]//Proceedings of the conference’ Law and Language: Prospect and Retrospect’ . University of Lapland,2001.
[36] NOLAN F. Voice[M]// BOGAN P S, ROBERTS A. In identif i cation: investigation, trial and scientif i c evidence. Jordan Publishing ,2011:381-390.
[37] HEUVEN V, CORTES P. Speaker specificity of filled pauses compared with vowels and consonants in Dutch[C] // IAFPA .Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017: 48-49.
[38] GOLD E, ROSS S, EARNSHAW K. Delimiting the West Yorkshire population: Examining the regional-specif i city of hesitation markers[C] // IAFPA .Proceedings of IAFPA2017. Split,Croatia:IAFPA,2017:50-52.
[39] HE L, DELWO V. Between-speaker intensity variability is maintained in different frequency bands of amplitude demodulated signal[C]// IAFPA .Proceedings of IAFPA2017. Split,Croatia:IAFPA,2017:55-58.
[40] DORREEN K, PAPP V. Bilingual speakers’ long-term fundamental frequency distributions as cross-linguistic speaker discriminants[C] // IAFPA .Proceedings of IAFPA2017. Split,Croatia:IAFPA,2017:61-64.
[41] ARANTES P, ERIKSSON A, GUTZEIT. Effect of language,speaking style and speaker on long-term f0 estimation[C] //ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:3897-3901.
[42] DIMOS K, DELLWO V, HE L. Rhythm and speaker-specif i c variability in shouted speech[C] // IAFPA .Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017:102-104.
[43] LOPEZ A, SAEIDI R, JUVELA L, et al. Normal-to-shouted speech spectral mapping for speaker recognition under vocal effort mismatch[C]// ICASSP. Proceedings of ICASSP2017.ICASSP,2017:4940-4944.
[44] HE L, DELLWO V. Speaker-specific temporal organizations of intensity contours[C] // IAFPA .Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017:163-166.
[45] VARO?ANEC-?KARI? G, BA?I? I, KI?I?EK G. Comparison of vowel space of male speakers of Croatian, Serbian and Slovenian language[C] // IAFPA .Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017: 142-146.
[46] MCDOUGALL K, DUCKWORTH M. Fluency prof i ling for forensic speaker comparison: a comparison of syllable- and timebased approaches[C] // IAFPA .Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017:129-131.
[47] WANG L, KANG J, LI J, et al. Speaker-specif i c dynamic features of diphthongs in Standard Chinese [C] // IAFPA . Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017: 91-95.
[48] HEEREN W. Speaker-dependency of /s/ in spontaneous telephone conversation[C] // IAFPA . Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017:68-71.
[49] FRANCHINI S. Construction of a voice profile: An acoustic study of /l/[C] // IAFPA . Proceedings of IAFPA2017. Split,Croatia:IAFPA,2017:183-186.
[50] FINGERLING B. Constructing a voice prof i le: Reconstruction of the L1 vowel set for a L2 speaker[C] // IAFPA . Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017:197-199.
[51] RHODES R, FRENCH P, HARRISON P, et al. Which questions,propositions and ‘relevant populations’ should a speaker comparison expert assess[C]// IAFPA. Proceedings of IAFPA2017.Split, Croatia: IAFPA. 2017: 40-44.
[52] HUGHES V, WORMALD J. WikiDialects: a resource for assessing typicality in forensic voice comparison[C] // IAFPA.Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 154-155.
[53] HUGHES V, FOULKES P. What is the relevant population?Considerations for the computation of likelihood ratios in forensic voice comparison[C] // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:3772-3776.
[54] AJILI M, BONASTRE J, KHEDER W, et al. Phonetic content impact on forensic voice comparison[C]// Spoken Language Technology Workshop(SLT), 2016 IEEE. IEEE, 2016:210–217.
[55] AJILI M, BONASTRE J, ROSSATTO S, et al. Inter-speaker variability in forensic voice comparison: a preliminary evaluation[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). IEEE, 2016:2114–2118.
[56] DODDINGTON G, LIGGETT W, MARTIN A, et al. Sheep,goats, lambs and wolves: A statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation[C]//Tech. Rep. DTIC Document, 1998.
[57] AJILI M, BONASTRE J, KHEDER W, et al. Homogeneity measure impact on target and non-target trials in forensic voice comparison[C] // ISCA. Proceedings of Inter speech 2017.Stockholm, Sweden: ISCA ,2017:2844-2848.
[58] FRENCH P. A developmental history of forensic speaker comparison in the UK[J]. English Phonetics, 2017: 271-286.
[59] FRENCH P, HARRISON P. Position statement concerning use of impressionistic likelihood terms in forensic speaker comparison cases[J]. International Journal of Speech Language and the Law, 2007, 14(1): 137-144.
[60] Association of Forensic Science Providers. Standards for the formulation of evaluative forensic science expert opinion[J]. Science and Justice 2009(49):161-164.
[61] VERMEULEN J, CAMBIER-LANGEVELD T. Outstanding cases: about case reports with a “strong” conclusion[C] // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 31-33.
[62] NOLAN F, MCDOUGALL K, JONG G D, et al. A forensic phonetic study of ‘dynamic’ sources of variability in speech: the dyvis project[C]//Proceedings of the 11th Australian International Conference on Speech Science & Technology.University of Auckland, 2006: 13-18.
[63] MORRISON G S, ZHANG C, ENZINGER E, et al. Forensic voice comparison databases[DB/OL], 2015. http://www.forensic-voice-comparison.net/
[64] RAMOS D, GONZALEZ-RODRIGUEZ J, LUCENA-MOLINA J J. Addressing database mismatch in forensic speaker recognition with Ahumada III: a public real-casework database in Spanish[C]. International Speech Communication Association.2008.
[65] VLOED V D, BOUTEN J, LEEUWEN D. NFI-FRITS: A forensic speaker recognition database and some fi rst experiments[C]//Proceedings of Odyssey: The Speaker and Language Recognition Workshop. 2014:6-13.
[66] AJILI M, BONASTRE J, ROSSATO S. FABIOLE, a Speech database for forensic speaker comparison[C]// Proceedings of LREC-Conference, Slovenia. 2016:726-733.
[67] NAGRANI A, CHUNG J, ZISSERMAN A. VoxCeleb: a largescale speaker identif i cation dataset[J]. Sound. 2017.
[68] PARK S J, YEUNG G, KREIMAN J, et al. Using voice quality features to improve short-utterance, text-independent speaker verification systems[C] // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:1522-1526.
[69] SOLEWICZ Y, JESSEN M, VAN DER VLOED. Null-Hypothesis LLR: a proposal for forensic automatic speaker recognition[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:2849-2853.
[70] TSCH?PE N. Analysis of i-vector-based false-accept trials in a dialect labelled telephone corpus[C] // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 65-67.
[71] ALEXANDER A. Not a lone voice: automatically identifying speakers in multi-speaker recordings[C] // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 80-82.
[72] MILO?EVI? M, GLAVITSCH U. Combining Gaussian mixture models and segmental feature models for speaker recognition[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:2042-2043.
[73] FRENCH J, HARRISON P, KIRCHHüBEL C, et al. From receipt of recordings to dispatch of report: opening the blinds on lab practices[C] // IAFPA .Proceedings of IAFPA2017. Split,Croatia: IAFPA. 2017: 29-30.
[74] WAGNER I. The BKA standard operation procedure of forensic speaker comparison and examples of case work[C] // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 34-36.
[75] 韓文靜,李海峰,阮華斌,等 .語(yǔ)音情感識(shí)別研究進(jìn)展綜述[J].軟件學(xué)報(bào),2014, 25(1):37-50.
[76] KOVA?I? D. Voice gender identification in cochlear implant users [C] // IAFPA .Proceedings of IAFPA2017. Split, Croatia:IAFPA. 2017: 23-25.
[77] GEORG A. The effect of dialect on age estimation[C] // IAFPA.Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 118-121.
[78] TOMI? K. Cross-language accent analysis for determination of origin[C] // IAFPA .Proceedings of IAFPA2017. Split, Croatia:IAFPA. 2017: 171-173.
[79] JONG-LENDLE G, KEHREIN R, URKE F, et al. Language identif i cation from a foreign accent in German[C] // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 135-138.
[80] SCHWAB S, AMATO M, DELLWO V, et al. Can we hear nicotine craving[C] // IAFPA .Proceedings of IAFPA2017. Split,Croatia: IAFPA. 2017: 115-117.
[81] RODMONGA P, TATIANA A, NIKOLAY B, et al. Perceptual auditory speech features of drug-intoxicated female speakers(preliminary results)[C] // IAFPA .Proceedings of IAFPA2017.Split, Croatia: IAFPA. 2017: 118-121.
[82] KELLY F, FORTH O, ATREYA A, et al. What your voice says about you: automatic speaker profiling using i-vectors[C] //IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA.2017: 72-75.
[83] WATT D, JENKINS M, BROWN G. Performance of human listeners vs. the Y-ACCDIST automatic accent classif i er in an accent authentication task[C] // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 139-141.
[84] KATHIRESAN T, DELLWO V. Cepstral dynamics in MFCCs using conventional deltas for emotion and speaker recognition[C] // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA.2017: 105-108.
[85] HIPPEY F, GOLD E. Detecting remorse in the voice: A preliminary investigation into the perception of remorse using a voice line-up methodology[C] // IAFPA .Proceedings of IAFPA2017.Split, Croatia: IAFPA. 2017: 179-182.
[86] BIZOZZERO S, NETZSCHWITZ N, LEEMANN A. The effect of fundamental frequency f0, syllable rate and pitch range on listeners’ perception of fear in a female speaker’s voice[C]// IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA.2017: 174-178.
[87] SATT A, ROZENBERG S, HOORY R. Eff i cient emotion recognition from speech using deep learning on spectrograms[C] //ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:1089-1093.
[88] ZHANG R, ATSUSHI A, KOBASHIKAWA S, et al. Interaction and transition model for speech emotion recognition in dialogue[C] // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017: 1094-1097.
[89] PARTHASARATHY S, BUSSO C. Jointly predicting arousal,valence and dominance with multi-task learning[C] // ISCA.Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:1103-1107.
[90] LE D, ALDENEH Z, PROVOST E. Discretized continuous speech emotion recognition with multi-task deep recurrent neural network[C] // ISCA. Proceedings of Inter speech 2017.Stockholm, Sweden: ISCA ,2017:1108-1112.
[91] SCHRODER A, STONE S, BIRKHOLZ P. The sound of deception - what makes a speaker credible[C] // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:1467-1471.
[92] MENDELS G, LEVITAN S, LEE K. Hybrid acoustic-lexical deep learning approach for deception detection[C] // ISCA.Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:1472-1476.
[93] ALI Z, IMRAN M, ALSULAIMAN M. An automatic digital audio authentication/forensics system[J]. Digital Object Identif i er.2017(5):2994-3007.
[94] GRIGORAS C, SMITH J. Large scale test of digital audio fi le structure and format for forensic analysis[C]. 2017 AES International Conference on Audio Forensics,2017.
[95] SMITH J, LACEY D, KOENIG B, et al. Triage approach for the forensic analysis of apple ios audio fi les recorded using the “voice memos” app[C]. 2017 AES International Conference on Audio Forensics,2017.
[96] PATOLE R, KORE G, REGE P. Reverberation based tampering detection in audio recordings[C]. 2017 AES International Conference on Audio Forensics,2017.
[97] Advisory Panel of White House Tapes. The EOB Tape of June 20, 1972: Report on a Technical Investigation Conducted for the U.S. District Court for the District of Columbia[R]. 1974.
[98] GRIGORAS C. Application of ENF Analysis Method in Forensic Authentication of Digital Audio and Video Recordings[J]. Journal of the Audio Engineering Society, 2007, 57 (9) :643-661.
[99] GRIGORAS C. Statistical Tools for Multimedia Forensics[C].39th International Conference: Audio Forensics: Practices and Challenges, 2010.
[100] HUA G, THING V. On practical issues of electric network frequency based audio forensics[J]. IEEE Transactions on Information Forensics & Security,2017(5): 20640-20651.
[101] JAMES Z, GRIGORAS C, SMITH J. A low cost, cloud based, portable, remote ENF system[C]. 2017 AES International Conference on Audio Forensics ,2017.
[102] HUA G, ZHANG Y, GOH J. Audio authentication by exploring the absolute-error-map of ENF signals[J]. IEEE Transactions on Information Forensics & Security,2016(5):1003-1016.
[103] REIS P M G, MIRANDA R, GALDO G. ESPRIT-Hilbert based audio tampering detection with SVM classif i er for forensic analysis via electrical network frequency[J]. IEEE Transactions on Information Forensics & Security, 2017(4):853-864.
[104] 操文成.語(yǔ)音偽造盲檢測(cè)技術(shù)研究[D].成都:西南交通大學(xué), 2017.
[105] 孫蒙蒙.錄音真實(shí)性辨識(shí)和重翻錄檢測(cè)[D].深圳:深圳大學(xué), 2017.
[106] 申小虎,金恬,張長(zhǎng)珍,等.錄音資料真實(shí)性鑒定的頻譜檢驗(yàn)技術(shù)研究[J]. 刑事技術(shù) , 2017,42(3):173-177.
[107] GUZEWICH P, ZAHORIAN S. Improving speaker verif i cation for reverberant conditions with deep neural network dereverberation processing[C] // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:171-175.
[108] HAN K, WANG Y, WANG D. Learning spectral mapping for speech dereverberation[C]// 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP),2014:4661-4665.
[109] HAN K, WANG Y, WANG D, et al. Learning spectral mapping for speech dereverberation and denoising[J]. IEEE/ACM Transactions on Audio, Speech,and Language Processing,2015,23 (6) :982-992.
[110] WU B, LI K, YANG M, et al. A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems[C]. 2016 Asia-Pacif i c Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016.
[111] WU B, LI K, YANG M, Et al. A reverberation-time-aware approach to speech dereverberation based on deep neural networks[J]. IEEE/ACM transactions on audio, speech, and language processing, 2017,25(1):102-111.
[112] BULLING P, LINHARD K, WOLF A, et al. Stepsize control for acoustic feedback cancellation based on the detection of reverberant signal periods and the estimated system distance[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:176-180.
[113] WU Y C, HWANG H, WANG S, et al. A post-f i ltering approach based on locally linear embedding difference compensation for speech enhancement[C] // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:1953-1957.
[114] OGAWA A, KINOSHITA K, DELCROIX M, et al. Improved example-based speech enhancement by using deep neural network acoustic model for noise robust example search[C] //ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:1963-1967.
[115] GELDERBLOM F B, GRONSTAD T, VIGGEN E. Subjective intelligibility of deep neural network-based speech enhancement[C] // ISCA. Proceedings of Inter speech 2017.Stockholm, Sweden: ISCA ,2017:1968-1972.
[116] QIAN K, ZHANG Y, CHANG S, et al. Speech enhancement using bayesian wavenet[C] // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:2013-2017.
[117] PASCUAL S, BONAFONTE A, SERRA J. SEGAN: Speech enhancement generative adversarial network[C] // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:3642-3646.
[118] MAITI S, MANDEL M. Concatenative resynthesis using twin networks[C] // ISCA. Proceedings of Inter speech 2017.Stockholm, Sweden: ISCA ,2017:3647-3651.