李少波 楊玲 于輝輝 陳英義
摘要:快速準(zhǔn)確的魚類識(shí)別系統(tǒng)需要良好的識(shí)別模型和部署系統(tǒng)作為支撐。近年來,卷積神經(jīng)網(wǎng)絡(luò)在圖像識(shí)別領(lǐng)域取得了巨大成功,不同的卷積網(wǎng)絡(luò)模型都有不同的優(yōu)點(diǎn)和缺點(diǎn),面對(duì)眾多可供選擇的模型結(jié)構(gòu),如何選擇和評(píng)價(jià)卷積神經(jīng)網(wǎng)絡(luò)模型成為了必須考慮的問題。此外,在模型應(yīng)用方面,移動(dòng)終端直接部署深度學(xué)習(xí)模型需要對(duì)模型進(jìn)行裁剪、壓縮處理,影響精度的同時(shí)還會(huì)導(dǎo)致安裝包體積增大,不利于模型升級(jí)維護(hù)。針對(duì)上述問題,本研究根據(jù)水下魚類實(shí)時(shí)識(shí)別任務(wù)特點(diǎn),選取了AlexNet、GoogLeNet、ResNet和DenseNet預(yù)訓(xùn)練模型進(jìn)行對(duì)比試驗(yàn)研究,通過在 Ground-Truth 魚類公開數(shù)據(jù)集基礎(chǔ)上對(duì)圖像進(jìn)行隨機(jī)翻轉(zhuǎn)、旋轉(zhuǎn)、顏色抖動(dòng)來增強(qiáng)數(shù)據(jù),使用 Label smoothing 作為損失函數(shù)緩解模型過擬合問題,通過研究 Ranger 優(yōu)化器和 Cosine 學(xué)習(xí)率衰減策略進(jìn)一步提高模型訓(xùn)練效果。統(tǒng)計(jì)各個(gè)識(shí)別模型在訓(xùn)練集和驗(yàn)證集上的精確度和召回率,最后綜合精確度和召回率量化模型識(shí)別效果。試驗(yàn)結(jié)果表明,基于DenseNet訓(xùn)練的魚類識(shí)別模型綜合評(píng)分最高,在驗(yàn)證集的精確度和召回率分別達(dá)到了99.21%和96.77%,整體 F1值達(dá)到了0.9742,模型理論識(shí)別精度達(dá)到預(yù)期?;?Python 開發(fā)并部署了一套遠(yuǎn)程水下魚類實(shí)時(shí)識(shí)別系統(tǒng),將模型部署到遠(yuǎn)程服務(wù)器,移動(dòng)終端通過網(wǎng)絡(luò)請(qǐng)求進(jìn)行魚類識(shí)別模型調(diào)用,驗(yàn)證集圖像實(shí)際測(cè)試表明,在網(wǎng)絡(luò)良好條件下,移動(dòng)終端可以在1 s 內(nèi)準(zhǔn)確識(shí)別并顯示魚類信息。
關(guān)鍵詞:魚類識(shí)別模型;卷積神經(jīng)網(wǎng)絡(luò);模型評(píng)價(jià);安卓;Ground-Truth ;實(shí)時(shí)識(shí)別系統(tǒng)
中圖分類號(hào):S964;TP183???????????? 文獻(xiàn)標(biāo)志碼:A??????????????? 文章編號(hào):SA202202006
引用格式:李少波, 楊玲, 于輝輝, 陳英義.水下魚類品種識(shí)別模型與實(shí)時(shí)識(shí)別系統(tǒng)[J].智慧農(nóng)業(yè)(中英文), 2022, 4(1):130-139.
LI Shaobo, YANG Ling, YU Huihui, CHEN Yingyi. Underwater fish species identification model and real-time iden‐tification system[J]. Smart Agriculture, 2022, 4(1):130-139.(in Chinese with English abstract)
1 引言
魚類是最重要的水產(chǎn)品之一,世界上已知的魚類有28,000多種[1],研發(fā)快速、準(zhǔn)確的自動(dòng)化魚類識(shí)別系統(tǒng)在魚類知識(shí)科普、混合養(yǎng)殖、海洋監(jiān)管等方面具有重要意義。一個(gè)完整的自動(dòng)化魚類識(shí)別系統(tǒng)至少應(yīng)該包含識(shí)別方法的構(gòu)建和模型部署兩個(gè)方面。
在魚類識(shí)別方法方面,傳統(tǒng)的基于手工特征提取的魚類識(shí)別方法,主要依靠魚的紋理、顏色、形狀等特征進(jìn)行區(qū)分[2],雖然通過應(yīng)用尺度不變特征變換 ( Scale-Invariant? Feature? Trans‐form , SIFT )[3]、方向梯度直方圖(Histogram of Oriented Gradient , HOG )[4] 等方法取得了卓有成效的改進(jìn),但魚類特征需要依賴專家手動(dòng)設(shè)計(jì),當(dāng)識(shí)別另一種魚類時(shí)需要重新設(shè)計(jì)這些特征,方法不具有通用性且識(shí)別精度不高[5]。近年來,卷積神經(jīng)網(wǎng)絡(luò)( Convolution Neural Net‐ work , CNN )在圖像識(shí)別領(lǐng)域取得了巨大進(jìn)展[6] ,Krizhevsky等[7]提出的AlexNet卷積神經(jīng)網(wǎng)絡(luò)在 ImageNet Large? Scale Visual Recognition Challenge ( ILSVRC )比賽中大放異彩,一舉斬獲了 ILSVRC2012比賽冠軍。在AlexNet之后,研究者們陸續(xù)提出了許多優(yōu)秀的改進(jìn)的 CNN 模型,如VGGNet [8] 、GoogLeNet [9] 、ResNet [10]、DenseNet [11]等,這些模型在解決圖像分類問題上都取得了優(yōu)異的成績(jī)。CNN 系列算法已經(jīng)成為了完成圖像識(shí)別任務(wù)的最佳算法之一[12]。不同的 CNN模型有不同的特點(diǎn),沒有一種模型可以在所有場(chǎng)景下都優(yōu)于其它模型[13],例如 VGG‐ Net 相比AlexNet可以學(xué)習(xí)更多特征,但也大幅增加了模型大小[14];GoogLeNet的 Inception模塊大幅減少了模型參數(shù),但并沒有解決隨著深度的增加可能會(huì)出現(xiàn)的模型退化問題[15];ResNet緩解了隨著深度增加導(dǎo)致的梯度消失問題,但增加了實(shí)現(xiàn)難度并且引入了許多需要手工設(shè)置的權(quán)重[16]。對(duì)于具體的水下魚類識(shí)別任務(wù),CNN 結(jié)構(gòu)如何選擇和評(píng)價(jià)模型成為必須考慮的問題。
在模型部署方面,將模型直接部署到安卓移動(dòng)終端是一種可行的方案。隨著移動(dòng)互聯(lián)網(wǎng)時(shí)代發(fā)展和智能終端的普及,越來越多的人習(xí)慣用手機(jī)上網(wǎng)?!吨袊?guó)互聯(lián)網(wǎng)絡(luò)發(fā)展?fàn)顩r統(tǒng)計(jì)報(bào)告》顯示,截止到2020年12月,中國(guó)手機(jī)網(wǎng)民規(guī)模達(dá)到了9.86億,網(wǎng)民中使用手機(jī)上網(wǎng)的比例達(dá)到了99.7%[17]。2021年安卓系統(tǒng)占據(jù)了將近90%手機(jī)市場(chǎng)份額[18],在安卓終端上實(shí)現(xiàn)魚類識(shí)別可方便用戶使用,大幅降低終端用戶學(xué)習(xí)成本。但安卓終端相比于服務(wù)器計(jì)算能力有限,一般的深度模型需要針對(duì)移動(dòng)設(shè)備進(jìn)行裁剪、壓縮處理后才能部署到移動(dòng)終端,對(duì)模型進(jìn)行裁剪和壓縮會(huì)影響模型精度,而且將模型直接打包到安裝包會(huì)顯著影響安裝包大小,不利于功能擴(kuò)展和維護(hù)升級(jí)。
針對(duì)上述問題,本研究以實(shí)現(xiàn)高精度魚類識(shí)別系統(tǒng)為出發(fā)點(diǎn),嘗試通過研究多個(gè)主流 CNN結(jié)構(gòu),結(jié)合數(shù)據(jù)集特點(diǎn)和任務(wù)要求初步選擇模型,通過設(shè)計(jì)模型對(duì)比試驗(yàn)進(jìn)一步量化模型優(yōu)劣,并設(shè)計(jì)一套遠(yuǎn)程魚類識(shí)別系統(tǒng)完成模型部署,以實(shí)現(xiàn)一套完整的高精度魚類在線識(shí)別系統(tǒng)。
2 研究數(shù)據(jù)與方法
2.1系統(tǒng)設(shè)計(jì)
魚類識(shí)別系統(tǒng)的整體設(shè)計(jì)如圖1所示,用戶可以通過調(diào)用安卓手機(jī)攝像頭采集魚類圖像信息,通過網(wǎng)絡(luò)上傳魚類圖像至服務(wù)器進(jìn)行識(shí)別,服務(wù)器在成功識(shí)別魚類品種后查詢魚類信息數(shù)據(jù)庫獲取魚的簡(jiǎn)介、分布、習(xí)性、生長(zhǎng)周期等魚類知識(shí)庫信息返回給安卓客戶端顯示。
2.2數(shù)據(jù)處理
采用 Ground-Truth 公開數(shù)據(jù)集[19] 中圖像訓(xùn)練模型。數(shù)據(jù)集包含由海洋生物學(xué)家手動(dòng)標(biāo)記的23種魚類共計(jì)27,370張水下圖像,圖 2展示了數(shù)據(jù)集的每種魚類的圖像和數(shù)量。將圖像分成80%的訓(xùn)練數(shù)據(jù)集(21,819張圖像)和20%的驗(yàn)證數(shù)據(jù)集(5474張圖像),在訓(xùn)練過程中使用隨機(jī)水平翻轉(zhuǎn)、隨機(jī)旋轉(zhuǎn)、顏色抖動(dòng)等數(shù)據(jù)增強(qiáng)技術(shù)擴(kuò)大樣本大小。圖像大小縮放到260×260像素作為模型輸入。
2.3基本原理
2.3.1 卷積神經(jīng)網(wǎng)絡(luò)模型
卷積神經(jīng)網(wǎng)絡(luò)是圖像處理領(lǐng)域廣泛應(yīng)用的深度學(xué)習(xí)網(wǎng)絡(luò),本質(zhì)上是一種層次網(wǎng)絡(luò)[20],其主要的特點(diǎn)是包含卷積層和池化層,原始輸入圖像的具象信息通過層層卷積和池化操作被提取成高層次抽象信息[21],到最后一層后根據(jù)損失函數(shù)計(jì)算預(yù)測(cè)值和真實(shí)結(jié)果的誤差再逐層向前傳播矯正模型參數(shù),直到模型收斂[22]。卷積和最大池化操作如圖3所示。
卷積操作的計(jì)算公式為
其中,L表示神經(jīng)網(wǎng)絡(luò)層數(shù);Mj表示輸入特征圖;Ki,j(L)表示卷積核;bj(L)表示偏置;f (·)表示激活函數(shù)。
AlexNet、VGGNet、GoogLeNet、ResNet、DenseNet是 CNN 模型的典型代表,統(tǒng)計(jì)顯示[24],各個(gè) CNN模型大小如表1所示。
2.3.2 混淆矩陣
混淆矩陣是一種比較預(yù)測(cè)類別和真實(shí)類別差異的可視化矩陣[25]。每一列的總數(shù)表示預(yù)測(cè)為該類別的數(shù)據(jù)的數(shù)目,每一行的數(shù)據(jù)總數(shù)表示該類別的數(shù)據(jù)實(shí)際的數(shù)目,二分類任務(wù)的混淆矩陣如圖4所示。
其中,? TN (True Negative) 表示預(yù)測(cè)為 Negative類正確的數(shù)量,F(xiàn)N (False Negative)表示預(yù)測(cè)為 Negative類錯(cuò)誤的數(shù)量,F(xiàn)P ( False Pos‐itive)表示預(yù)測(cè)為 Positive 類錯(cuò)誤的數(shù)量, TP ( True Positive )表示預(yù)測(cè)為 Positive 類正確的數(shù)量。
在混淆矩陣的基礎(chǔ)上延伸出了精確率( Pre‐cision)、召回率( Recall )、準(zhǔn)確率(Accuracy)等指標(biāo),精確率的計(jì)算公式為:
召回率的計(jì)算公式為:
Recall =
準(zhǔn)確率的計(jì)算公式為:
TP + TN
Accurary =
平衡分?jǐn)?shù) F1值是綜合考慮了精確度、召回率的調(diào)和值,一般 F1值越大說明模型效果越好,其計(jì)算公式為:
2.4模型選擇與構(gòu)建
CNN 模型要想獲得較理想的識(shí)別模型,需要基于大量數(shù)據(jù)集訓(xùn)練[26],而且識(shí)別任務(wù)越復(fù)雜需要的數(shù)據(jù)集越大[27]。對(duì)于水下魚類實(shí)時(shí)識(shí)別任務(wù),由于光線的變換、水的折射等復(fù)雜水下環(huán)境,水下魚類圖像往往存在分辨率不足、噪點(diǎn)多等特點(diǎn)[2],而且不同種類的魚的生存水域和生活習(xí)性存在巨大差異,一些魚類只能生活在深海之中,獲取大規(guī)模的水下魚類圖像并不容易。由于可獲取的水下魚類數(shù)據(jù)集比較有限,而AlexNet、ResNet等著名 CNN模型已經(jīng)在圖像分類問題上經(jīng)過了長(zhǎng)時(shí)間、大規(guī)模的訓(xùn)練,以這些預(yù)訓(xùn)練模型為起點(diǎn)遷移到新的訓(xùn)練任務(wù)的遷移學(xué)習(xí)方法可以有效降低模型對(duì)于訓(xùn)練集大小的依賴,適合計(jì)算資源緊張、數(shù)據(jù)集不多的情形[28],本試驗(yàn)采用基于預(yù)訓(xùn)練模型遷移學(xué)習(xí)的方法生成模型。從表2可以看出,VGGNet生成的模型最大,達(dá)到了138 Mb ,模型越大意味著模型計(jì)算量越大,從而識(shí)別速度越慢。由于本次魚類實(shí)時(shí)識(shí)別任務(wù)對(duì)模型識(shí)別速度要求較高,本試驗(yàn)選擇了AlexNet、GoogLeNet、ResNet和DenseNet預(yù)訓(xùn)練網(wǎng)絡(luò)來訓(xùn)練魚類識(shí)別模型,通過對(duì)各個(gè)模型最后的全連接層進(jìn)行微調(diào)適配水下魚類識(shí)別任務(wù)。
為緩解模型過度擬合,選擇 Label smooth ‐ing [29]作為損失函數(shù)。Ranger [30]優(yōu)化器結(jié)合了Radam [31] 和 Lookahead [32] 到單一的優(yōu)化器中,從而可以獲取更高的精度,本文選擇了 Ranger作為訓(xùn)練優(yōu)化器。將學(xué)習(xí)率統(tǒng)一設(shè)置成0.0001,并使用 Cosine 策略衰減學(xué)習(xí)率,批處理大小為64,迭代30輪,完整的超參數(shù)的配置如表2所示。
3 結(jié)果分析
根據(jù)表2參數(shù)設(shè)置,分別在AlexNet、GoogLeNet、ResNet50和 DenseNet169上進(jìn)行模型訓(xùn)練,并在驗(yàn)證集上測(cè)試了模型準(zhǔn)確率、精確率、召回率等,試驗(yàn)結(jié)果如表3所示。
準(zhǔn)確率可以很大程度上說明模型的優(yōu)劣,對(duì)于一般的分類問題,我們希望模型越準(zhǔn)確越好。由圖1可以看出,本試驗(yàn)所采用的 Ground-Truth 數(shù)據(jù)集存在嚴(yán)重的數(shù)據(jù)不平衡問題,最少的黑嘴雀魚只有16張圖片,而最多的網(wǎng)紋宅泥魚則多達(dá)12, 112張圖片,相差750多倍。此種情況下,模型只需要將分類結(jié)果分成網(wǎng)紋宅泥魚就會(huì)得到很高的準(zhǔn)確率,僅依靠準(zhǔn)確率指標(biāo)已經(jīng)無法準(zhǔn)確評(píng)估模型效果。由表3可以看出基于GoogLeNet訓(xùn)練的模型參數(shù)量最小,AlexNet模型的浮點(diǎn)計(jì)算量最小但同時(shí)參數(shù)量最大, DenseNet169的模型參數(shù)量和浮點(diǎn)計(jì)算次數(shù)比較均衡,基于 ResNet50訓(xùn)練的模型驗(yàn)證集精確度最高達(dá)到了99.26%,但同時(shí)召回率低于 DenseNet169。通常精確率越高,召回率就會(huì)越低。由公式(5) 可以看到 F1值能夠更全面地衡量模型優(yōu)劣,更適合用于評(píng)估本試驗(yàn)?zāi)P汀8鶕?jù)表3的結(jié)果,進(jìn)一步計(jì)算各個(gè)模型整體的 F1值,結(jié)果如表4所示。
由表4可見, DenseNet169模型整體 F1值達(dá)到了0.9742,是本次試驗(yàn)獲得的 F1值最高的模型。雖然基于GoogLeNet訓(xùn)練的模型參數(shù)量和浮點(diǎn)計(jì)算次數(shù)都少于 DenseNet169模型,但考慮模型是部署到有較強(qiáng)計(jì)算能力的服務(wù)器端,并且優(yōu)先保障魚類識(shí)別應(yīng)用的精準(zhǔn)度,本研究選擇了識(shí)別精度更高的 DenseNet169模型進(jìn)行部署。
4 水下魚識(shí)別模型部署
4.1服務(wù)器實(shí)現(xiàn)
(1)開發(fā)平臺(tái)
操作系統(tǒng):Ubuntu 20.04。
平臺(tái)環(huán)境:? Python 3.8, CUDA 11.2,Py‐Torch 1.7.0, Anaconda 2.0.3, Django 3.2.4[33] ,Gunicorn 20.0.4[34]等。
開發(fā)工具:PyCharm Professional 2019.2。
數(shù)據(jù)庫:MySQL。
為方便模型調(diào)用,本研究選擇了 Python語言進(jìn)行服務(wù)器開發(fā),在配備 GT1650顯卡的 Ubuntu系統(tǒng)上安裝了英偉達(dá)顯卡驅(qū)動(dòng),在 Anaconda 創(chuàng)建虛擬環(huán)境并在虛擬環(huán)境中安裝 Django 、Py‐Torch 、numpy、pandas 等必要工具包完成環(huán)境配置。
(2)項(xiàng)目搭建
在 PyCharm Professional中新建 Django項(xiàng)目,并將 Python 解釋器配置成上述虛擬環(huán)境,按照Django框架規(guī)范在配置文件里配置好數(shù)據(jù)庫和安全策略等各類中間件,創(chuàng)建FishInfo類映射到數(shù)據(jù)庫并納入 Django Admin 管理,在 urls.py 文件中完成 URL映射,服務(wù)器項(xiàng)目結(jié)構(gòu)如圖5所示。
(3)模型集成
參考Pytorch官方文檔[35]將上述權(quán)重文件打包成*. tar 格式,創(chuàng)建 classifier 類,在該類的__init__方法中使用Pytorch提供的 torch系列API完成模型初始化,并新建perdict函數(shù)實(shí)現(xiàn)模型預(yù)測(cè)邏輯,最終通過在 Django 的 views.py 中引入classifier 類,傳遞圖像數(shù)據(jù)給 classifier 對(duì)象完成模型調(diào)用,模型集成和初始化代碼如圖6所示。
(4)魚類信息存儲(chǔ)
本系統(tǒng)采用 MySQL 來存儲(chǔ)結(jié)構(gòu)化的魚類信息,系統(tǒng)所涉及的核心表為FishInfo魚類信息表,包含了 Ground-Truth數(shù)據(jù)集中魚的介紹性信息,這些魚類信息采集自樂潛無線海洋魚類資料庫[36] 和臺(tái)灣魚類資料庫[37],F(xiàn)ishInfo表的設(shè)計(jì)如表5所示。
(5)服務(wù)發(fā)布
由于 Django本身帶的調(diào)試服務(wù)器性能較弱,為提高服務(wù)器并發(fā)效率,本研究使用Guni‐corn [34] 多線程形式來發(fā)布服務(wù)端程序,最終服務(wù)器的相關(guān)環(huán)境和提供的接口服務(wù)如圖7所示。
4.2安卓APP 實(shí)現(xiàn)
(1)開發(fā)平臺(tái)
操作系統(tǒng):macOS 11.2。
平臺(tái)環(huán)境:? JDK 1.8.0_211, Android? SDKPlatform 30。
開發(fā)工具:AndroidStudio 4.1.2[38]。
在 Mac 電腦上安裝 JDK 、Android SDK 并配置好環(huán)境變量,在AndroidStudio IDE中完成具體的 APP程序編碼工作。
(2)項(xiàng)目搭建
為提高編碼效率,APP引入了OKHttp [39]輕量級(jí)網(wǎng)絡(luò)框架,在子線程中收發(fā)識(shí)別請(qǐng)求,由于谷歌規(guī)定安卓代碼在子線程中不能更新界面,為方便界面交互引入了EventBus [40]來完成跨線程事件傳遞。以上兩個(gè)依賴均為開源項(xiàng)目依賴,在AndroidStudio的build.gradle配置文件中添加相應(yīng)的倉(cāng)庫地址即可完成依賴項(xiàng)導(dǎo)入。APP項(xiàng)目結(jié)構(gòu)如圖8所示。
(3)圖像采集和上傳
在圖像采集方面,為盡可能提高軟件兼容性,APP鎖定屏幕為豎屏,采用 Android SDK提供的 Camera 類完成相機(jī)的調(diào)用和圖像幀獲取,將原 NV21格式圖像幀包裝成 Bitmap并壓縮為固定的640×480像素大小,最終編碼成 Base64格式字符串,通過OKHttp在子線程中將 Base64編碼的圖片上傳至服務(wù)器請(qǐng)求識(shí)別。
(4)界面實(shí)現(xiàn)
當(dāng)服務(wù)器放回的識(shí)別可信度達(dá)到一定閾值后,APP顯示相關(guān)魚類信息,識(shí)別結(jié)果在 APP子線程中接收,通過EventBus完成識(shí)別結(jié)果的事件分發(fā),最終在主線程的界面完成識(shí)別結(jié)果顯示。APP的識(shí)別結(jié)果界面如圖9所示。
4.3系統(tǒng)測(cè)試
(1)測(cè)試環(huán)境
軟件環(huán)境:魚類在線識(shí)別系統(tǒng) V1.0。
移動(dòng)終端:紅米 K40手機(jī)。
服務(wù)器環(huán)境:配備 Intel i7-8700處理器和英偉達(dá) GTX1650顯卡的聯(lián)想臺(tái)式機(jī)。
網(wǎng)絡(luò)環(huán)境:校園無線局域網(wǎng)。
場(chǎng)地環(huán)境:實(shí)驗(yàn)室環(huán)境40 W熒光燈下。
(2)測(cè)試方法
通過在程序中加入耗時(shí)計(jì)算代碼測(cè)試識(shí)別過程耗時(shí),計(jì)量模型識(shí)別耗時(shí)以及從手機(jī)終端發(fā)出網(wǎng)絡(luò)請(qǐng)求到收到識(shí)別信息的總時(shí)延,具體的測(cè)試操作如圖10所示,在紅米 K40手機(jī)上安裝軟件后,在臺(tái)式電腦顯示器上打開驗(yàn)證集中的魚類圖片,利用手機(jī)后置攝像頭拍照進(jìn)行魚類識(shí)別,手機(jī)軟件上實(shí)時(shí)顯示識(shí)別結(jié)果和識(shí)別耗時(shí)。
(3)測(cè)試結(jié)果
使用上述環(huán)境和方法對(duì)魚識(shí)別系統(tǒng)應(yīng)用的實(shí)際測(cè)試,測(cè)試結(jié)果如表6所示。由表6可見,在上述測(cè)試環(huán)境下,APP可以在百毫秒級(jí)準(zhǔn)確識(shí)別魚類圖片,并且延遲的主要原因是網(wǎng)絡(luò)傳輸耗時(shí),基本可以滿足識(shí)別應(yīng)用需要。
5 結(jié)論與展望
本研究以水下魚類實(shí)時(shí)識(shí)別任務(wù)為切入點(diǎn),提供了一個(gè)完整的魚類識(shí)別任務(wù)模型選擇、訓(xùn)練的評(píng)價(jià)的思路,并提出了一種基于安卓移動(dòng)終端的遠(yuǎn)程魚類識(shí)別解決方案,該方案具有良好的擴(kuò)展性,可以為其它圖像識(shí)別任務(wù)提供重要參考,所實(shí)現(xiàn)系統(tǒng)可以應(yīng)用于魚類知識(shí)科普、魚類養(yǎng)殖等各類需要識(shí)別魚類的場(chǎng)景。本次試驗(yàn)結(jié)果表明: (1) 基于 DenseNet169訓(xùn)練的模型整體 F1 值達(dá)到了0.9742,是本試驗(yàn)獲得的最優(yōu)模型。(2) 本試驗(yàn)提出的基于安卓移動(dòng)終端的遠(yuǎn)程魚類識(shí)別解決方案取得了良好的識(shí)別效果,具有較好的識(shí)別實(shí)時(shí)性。
后續(xù)研究還可以在模型壓縮裁剪領(lǐng)域展開工作,對(duì)所部署模型進(jìn)行裁剪壓縮,進(jìn)一步提高模型識(shí)別效果。
參考文獻(xiàn):
[1] NELSON J S, GRANDE T C, WILSON M V. Fishesof the? world[M]. New? Jersey: John? Wiley & Sons,2016.
[2] YANG L, LIU Y, YU H, et al. Computer vision modelsin intelligent aquaculture with emphasis on fish detec‐tion? and? behavior? analysis: A review[J]. Archives? ofComputational? Methods? in? Engineering, 2021, 28(4):2785-2816.
[3] LOWE D G. Distinctive image features from scale-in‐variant keypoints[J]. International journal of computervision, 2004, 60(2):91-110.
[4] DALAL N, TRIGGS B. Histograms of oriented gradi‐ents for human detection[C]//2005 IEEE Computer So‐ciety Conference on Computer Vision and Pattern Rec‐ognition (CVPR'05). Piscataway,? New? York,? USA:IEEE, 2005:886-893.
[5] 盧宏濤, 張秦川.深度卷積神經(jīng)網(wǎng)絡(luò)在計(jì)算機(jī)視覺中的應(yīng)用研究綜述[J].數(shù)據(jù)采集與處理 , 2016, 31(1):1-17.
LU H, ZHANG Q. Applications of deep convolutionalneural network in computer vision[J]. Data Acquisitionand Processing, 2016, 31(1):1-17.
[6] AL-SAFFAR A A M, TAO H, TALAB M A. Review ofdeep? convolution neural network? in? image? classifica‐tion[C]//2017 International Conference on Radar, An‐tenna,? Microwave,? Electronics,? and? Telecommunica‐tions (ICRAMET). Piscataway,? New? York,? USA: IEEE, 2017:26-31.
[7] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Im‐agenet? classification? with? deep? convolutional? neural networks[J]. Advances in Neural Information Process‐ing Systems, 2012, 25:1097-1105.
[8] SIMONYAN K, ZISSERMAN A. Very deep convolu‐tional networks for large-scale image recognition[J/OL]. arXiv:1409.1556, 2014.
[9] SZEGEDY C, LIU W, JIA Y, et al. Going deeper withconvolutions[C]// The IEEE Conference on Computer vision and Pattern Recognition. Piscataway, New York, USA: IEEE, 2015:1-9.
[10] HE K, ZHANG X, REN S, et al. Deep residual learn‐ing? for? image ?recognition[C]// The? IEEE? Conference on? Computer Vision? and Pattern Recognition. Piscat‐ away, New York, USA: IEEE, 2016:770-778.
[11] HUANG? G,? LIU? Z,? VAN? DER? MAATEN? L,? et? al.Densely? connected? convolutional? networks[C]// The IEEE Conference on Computer Vision and Pattern Rec‐ognition. Piscataway,? New? York,? USA: IEEE, 2017:4700-4708.
[12] HU? H,? YANG? Y. A? Combined? GLQP? and? DBN-DRF? for? face? recognition? in? unconstrained? environ‐ments[C]//20172nd International Conference on Con‐trol,? Automation? and? Artificial? Intelligence (CAAI 2017). Paris: Atlantis Press, 2017:553-557.
[13] NOVAKOVI?? J? D,? VELJOVI? A,? ILI?? S? S,? et? al.Evaluation of classification models in machine learn‐ing[J]. Theory? and? Applications? of? Mathematics & Computer Science, 2017, 7(1):39-46.
[14] ERTOSUN? M? G,? RUBIN? D? L. Probabilistic? visualsearch for masses within mammography images using deep learning[C]//2015 IEEE International Conference on? Bioinformatics? and? Biomedicine (BIBM). Piscat‐ away, New York, USA: IEEE, 2015:1310-1315.
[15] LIN C, LIN C, WANG S, et al. Multiple convolutionalneural networks fusion using improved fuzzy integral for? facial? emotion? recognition[J]. Applied? Sciences, 2019, 9(13): ID 2593.
[16] BHATIA G S, AHUJA P, CHAUDHARI D, et al. Farm ‐guide-one-stop solution to farmers[J/OL]. Asian? Jour‐nal? For? Convergence? In? Technology, 2019, 4(1).[2021-09-06]. https://asianssr.org/index.php/ajct/article/ view/789.
[17] 中國(guó)互聯(lián)網(wǎng)絡(luò)信息中心.中國(guó)互聯(lián)網(wǎng)絡(luò)發(fā)展?fàn)顩r統(tǒng)計(jì)報(bào)告[EB/OL].[2021-09-07]. http://www. cac.gov.cn/2021-02/03/c_1613923423079314.htm.
[18]孫彥博.2021年中國(guó)手機(jī)操作系統(tǒng)行業(yè)研究報(bào)告[R].南京:頭豹研究院, 2021.
[19] BOOM? B? J,? HUANG? P X,? HE? J,? et? al. Supportingground-truth annotation of image datasets using cluster‐ing[C]// The 21st International Conference on PatternRecognition? (ICPR2012). Piscataway,? New? York,USA: IEEE, 2012:1542-1545.
[20] MAKANTASIS K, KARANTZALOS K, DOULAMISA,? et? al. Deep? supervised? learning? for? hyperspectraldata? classification? through? convolutional ?neural? net‐works[C]//2015 IEEE? International? Geoscience? andRemote? Sensing? Symposium (IGARSS). Piscataway,New York, USA: IEEE, 2015:4959-4962.
[21]魏秀參.解析深度學(xué)習(xí):卷積神經(jīng)網(wǎng)絡(luò)原理與視覺實(shí)踐[M].北京:電子工業(yè)出版社, 2018:13-14.
[22] RUMELHART D E, HINTON G E, WILLIAMS R J.Learning representations by back-propagating errors[J].nature, 1986, 323(6088):533-536.
[23] BOUVRIE? J. Notes? on? convolutional? neural? net‐works[J/OL].(2006-11-22)[2021-09-06]. https://www.researchgate.net/publication/28765140_Notes_on_Con‐volutional_Neural_Networks.
[24] KHAN A, SOHAIL A, ZAHOORA U, et al. A surveyof the recent architectures of deep convolutional neuralnetworks[J]. Artificial? Intelligence? Review, 2020, 53(8):5455-5516.
[25] RAHMAD F, SURYANTO Y, RAMLI K. Performancecomparison? of anti-spam? technology? using? confusionmatrix classification[C]// IOP Conference Series: Mate‐rials? Science? and? Engineering. Bandung,? Indonesia:IOP Publishing, 2020:12076.
[26] WANG H, ZHAO T, LI L C, et al. A hybrid CNN fea‐ture model for pulmonary nodule malignancy risk dif‐ferentiation[J]. Journal of X-ray Science and Technolo‐gy, 2018, 26(2):171-187.
[27] KAMILARIS? A,? PRENAFETA-BOLD?? F? X. A? re‐view of the use of convolutional neural networks in ag‐riculture[J]. The Journal of Agricultural Science, 2018,156(3):312-322.
[28]梁紅, 金磊磊, 楊長(zhǎng)生.小樣本情況基于深度學(xué)習(xí)的水下目標(biāo)識(shí)別研究[J].武漢理工大學(xué)學(xué)報(bào)(交通科學(xué)與工程版), 2019, 43(1):6-10.
LIANG H, JIN L, YANG C. Research on underwatertarget recognition based on depth learning with smallsample[J]. Journal of Wuhan University of Technology(Transportation? Science & Engineering), 2019, 43(1):6-10.
[29] ZHANG C, JIANG P, HOU Q, et al. Delving deep intolabel smoothing[J]. IEEE Transactions on Image Pro‐cessing, 2021, 30:5984-5996.
[30] GitHub-lessw2020/Ranger-Deep-Learning-Optimizer:Ranger — A synergistic optimizer using RAdam (Rec‐tified Adam), Gradient Centralization and LookAhead in? one? codebase[EB/OL].[2021/8/30]. https://github. com/lessw2020/Ranger-Deep-Learning-Optimizer.
[31] LIU L, JIANG H, HE P, et al. On the variance of theadaptive? learning? rate? and? beyond[J/OL]. arXiv:1908.03265, 2019.
[32] ZHANG M R, LUCAS J, HINTON G, et al. Looka‐head? optimizer: K? steps? forward, 1 step? back[J/OL]. arXiv:1907.08610, 2019.
[33] Django overview[EB/OL].[2021-07-28]. https://www.djangoproject.com/start/overview/.
[34] Gunicorn-Python??? WSGI??? HTTP??? Server??? forUNIX [EB/OL].[2021-07-28]. https://gunicorn.org/.
[35] PyTorch documentation[EB/OL].[2021-07-28]. https://pytorch.org/docs/stable/index.html.
[36] 中文海洋魚類資料庫[EB/OL].[2021-08-30]. http://sea.fundiving.cn.
[37]臺(tái)灣魚類資料庫[EB/OL].[2021-08-30]. https://fishdb.sinica.edu.tw/chi/species.php.
[38] Android developers[EB/OL].[2021-07-28]. https://developer.android.google.cn/studio.
[39] OkHttp[EB/OL].[2021-08-30]. https://square.github.io/okhttp/.
[40] GitHub-asaskevich/EventBus:[Go] Lightweight event‐bus? with? async? compatibility? for? Go[EB/OL].[2021-08-30]. https://github.com/asaskevich/EventBus.
Underwater Fish Species Identification Model andReal-Time Identification System
LI Shaobo1,2,3 , YANG Ling1,2,3 , YU Huihui4 , CHEN Yingyi1,2,3*
(1. College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China;2. National Innovation Centerfor Digital Fishery, China Agricultural University, Beijing 100083, China;3. Beijing Engineering and Technology Research Centrefor the Internet of Things in Agriculture, Beijing 100083, China;4. School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China )
Abstract: Convolutional neural network models have different advantages and disadvantages, it is becoming more and more dif‐ficult to select an appropriate convolutional neural network model in an actual fish identification project. The identification of underwater fish is a challenge task due to varies in illumination, low contrast, high noise, low resolution and sample imbalance between each type of image from the real underwater environment. In addition, deploying models to mobile devices directly will reduce the accuracy of the model sharply. In order to solve the above problems, Fish Recognition Ground-Truth dataset was used to training model in this study, which is provided by Fish4Knowledge project from University of Edinburgh. It contains 27, 370 images with 23 fish species, and has been labeled manually by marine biologists. AlexNet, GoogLeNet, ResNet and DenseNet models were selected initially according to the characteristics of real-time underwater fish identification task, then a comparative experiment was designed to explore the best network model. Random image flipping, rotation and color dithering were used to enhance data based on ground-truth fish dataset in response to the limited number of underwater fish images. Con‐sidering that there was a serious imbalance in the number of samples in each category, the label smoothing technology was used to alleviate model overfitting. The Ranger optimizer and Cosine learning rate attenuation strategy were used to further improve the training effect of the models. The accuracy and recall rate information of each model were recorded and counted. The results showed that, the accuracy and recall rate of the fish recognition model based on DenseNet reached 99.21% and 96.77% in train set and validation set respectively, its F1 value reached 0.9742, which was the best model obtained in the experiment. Finally, a remote fish identification system was designed based on Python language, in this system the model was deployed to linux server and the Android APP was responsible for uploading fish images via http to request server to identify the fishes and displaying the identification information returned by server, such as fish species, profiles, habits, distribution, etc. A set of recognition tests were performed on real Android phone and the results showed that in the same local area net the APP could show fish informa‐tion rapidly and exactly within 1 s.
Key words: fish identification model; CNN; model evaluation; Android; Ground-Truth; real-time identification system