王衛(wèi)兵 張曉琢 鄧強(qiáng)
摘要:針對(duì)現(xiàn)有的RGB-D圖像顯著性檢測(cè)技術(shù)難以充分挖掘深度圖像的有效信息,無(wú)法使RGB特征和深度特征有效融合的問(wèn)題,提出了一種多分支主干監(jiān)督網(wǎng)絡(luò)下的RGB-D圖像顯著性檢測(cè)方法?;赗esnet50網(wǎng)絡(luò)獲得兩種圖像的各層特征,利用深度改進(jìn)模塊從通道和空間注意力的角度提取到有用的深度特征信息。利用特征分組監(jiān)督融合模塊,依據(jù)卷積神經(jīng)網(wǎng)絡(luò)的理論,對(duì)RGB和深度特征從高層到底層分組進(jìn)行多尺度多模態(tài)特征融合,每組融合加入上層融合結(jié)果和真值圖進(jìn)行監(jiān)督,最終迭代得到預(yù)測(cè)顯著圖。通過(guò)4個(gè)具有代表性數(shù)據(jù)集上進(jìn)行的實(shí)驗(yàn),對(duì)比目前先進(jìn)的RGB-D圖像顯著性檢測(cè),表明此模型平均絕對(duì)誤差指標(biāo)最小,在F值、E值和S值指標(biāo)上均有提高,性能優(yōu)于其他模型,具有良好的魯棒性。
關(guān)鍵詞:
RGB-D圖像顯著性檢測(cè);多分支主干監(jiān)督網(wǎng)絡(luò);神經(jīng)網(wǎng)絡(luò);注意力機(jī)制;多模態(tài)融合
DOI:10.15938/j.jhust.2022.04.006
中圖分類號(hào): TP391
文獻(xiàn)標(biāo)志碼: A
文章編號(hào): 1007-2683(2022)04-0039-07
RGB-D Image Saliency Detection Based on Multi-branch Backbone Supervised Network
WANG Wei-bing ZHANG Xiao-zhuo DENG Qiang
(1.school of computer science and technology, Harbin university of science and technology, Harbin 150080, China;
2.On Line Operation Center of Harbin Power Supply Company, Heilongjiang Electric Power Co., Ltd., Harbin 150036, China)
Abstract:Aiming at the problem that the existing RGB-D image saliency detection technology is difficult to fully explore the effective information of depth image and can not effectively integrate RGB features and deep features, an RGB-D image saliency detection method under multi-branch backbone supervision network is proposed. We obtain the layer features of RGB image and deep image based on Resnet50 network, using the deep improvement module, useful deep feature information is extracted from the perspective of channel and spatial attention. Using the feature grouping supervised fusion module, according to the theory of convolutional neural network, the RGB and deep features are grouped from high level to bottom for multi-scale and multi-modal feature fusion. Each fusion group is supervised by the upper level fusion result and truth map, and finally the predicted saliency map is obtained iteratively. Experiments on four representative data sets show that compared with the current advanced RGB-D image saliency detection model. This model has the smallest average absolute error index, improves in F value, E value and S value, has better performance than other models, and has good robustness.
Keywords:RGB-D image saliency detection; multi-branch backbone supervised network; neural network; attention mechanism; multimodal fusion
0引言
顯著性目標(biāo)檢測(cè)技術(shù)的關(guān)鍵是提取目標(biāo)場(chǎng)景中最吸引人的重要區(qū)域,近年來(lái),許多人在計(jì)算機(jī)視覺(jué)領(lǐng)域探索了顯著性目標(biāo)檢測(cè)技術(shù),將該項(xiàng)技術(shù)應(yīng)用于語(yǔ)義分割[1],圖像分類[2],圖像壓縮[3]和圖像分割[4]等領(lǐng)域。在過(guò)去幾年里,已經(jīng)提出了各種基于RGB-D圖像的顯著性目標(biāo)檢測(cè)模型,微軟Kinect等深度傳感器的出現(xiàn)也提升了對(duì)深度圖像的捕獲。但目前的顯著性目標(biāo)檢測(cè)方法和技術(shù)仍然存在不足。
RGB-D圖像中RGB圖像與深度圖像是成對(duì)出現(xiàn)的,RGB圖像提供詳細(xì)的顏色紋理信息,深度圖像則提供目標(biāo)區(qū)域的形狀,位置等眾多空間信息。由于采集設(shè)備的限制,在數(shù)據(jù)集中會(huì)出現(xiàn)邊緣模糊或遭受噪聲干擾的低質(zhì)量深度圖像,如何克服其造成的影響,從中獲取有用的特征信息成為提升顯著性檢測(cè)性能的關(guān)鍵之一。JL-DCF[5]網(wǎng)絡(luò)將深度圖像視為彩色圖像的特殊情況,使用共享的 CNN 進(jìn)行特征提取;DPANet[6]網(wǎng)絡(luò)使用深度感知模塊來(lái)評(píng)估深度圖的潛力并減少污染的影響;D3Net[7]網(wǎng)絡(luò)提出了深度過(guò)濾單元,過(guò)濾掉影響性能的深度圖像。
當(dāng)從RGB圖和深度圖中捕獲到高質(zhì)量的多尺度特征時(shí),如何將其有效融合以獲得高水平的顯著圖也是當(dāng)前探索顯著性檢測(cè)技術(shù)的熱點(diǎn)問(wèn)題。CPFP[8]模型提出了流體金字塔積分模塊以分層的方式融合跨模態(tài)信息;TAN[9]引入了通道式注意機(jī)制實(shí)現(xiàn)選擇性的跨模態(tài)跨層次特征融合。這些方法從不同角度探索了如何使特征有效匹配融合,特征融合的效果決定著檢測(cè)性能的高低。
針對(duì)上述問(wèn)題,本文采用了一種新型的多分支主干監(jiān)督網(wǎng)絡(luò)進(jìn)行RGB-D圖像的顯著性檢測(cè)。
本文的主要貢獻(xiàn)有:
①為了盡可能全面充分融合各級(jí)有用特征,本文采用了一種多分支主干監(jiān)督的網(wǎng)絡(luò)結(jié)構(gòu)對(duì)RGB-D圖像進(jìn)行顯著性目標(biāo)檢測(cè),其中,基于卷積神經(jīng)網(wǎng)絡(luò)引入特征分層監(jiān)督融合模塊(feature grouped supervision module,F(xiàn)GM),利用高層特征具有指導(dǎo)優(yōu)化低層細(xì)節(jié)特征的特點(diǎn),從高到低分組迭代監(jiān)督優(yōu)化結(jié)果。
②為了有效利用深度圖像的特征信息,減少低質(zhì)量深度圖的影響,本文基于注意力機(jī)制引入深度改進(jìn)模塊(deep change module,DCM),增強(qiáng)了深度特征的顯著性表現(xiàn)能力。
③在廣泛使用的數(shù)據(jù)集上進(jìn)行的實(shí)驗(yàn)表明,本文提出的網(wǎng)絡(luò)模型對(duì)RGB-D圖像的顯著性檢測(cè)能力優(yōu)于目前先進(jìn)的模型,具有良好的魯棒性。
1方法
目前RGB-D圖像顯著性檢測(cè)技術(shù)多數(shù)直接聚合多模態(tài)多尺度的特征[10-11],本文采用雙流結(jié)構(gòu),將RGB和深度圖像分別輸入,獲得獨(dú)立特征再進(jìn)行處理融合。
1.1多分支主干監(jiān)督網(wǎng)絡(luò)
1.2深度改進(jìn)模塊
1.3特征分組監(jiān)督融合模塊
1.4本文模型的代碼描述
2實(shí)驗(yàn)
2.1數(shù)據(jù)集
2.2評(píng)價(jià)指標(biāo)
2.3實(shí)驗(yàn)細(xì)節(jié)
本文在windows10操作系統(tǒng)上進(jìn)行實(shí)驗(yàn),使用CPU型號(hào)為英特爾酷睿I7-7700HQ,2.8GHz,GPU型號(hào)為 1080ti,應(yīng)用了深度學(xué)習(xí)模型框架pytorch[25]。本文使用預(yù)訓(xùn)練好的ResNet50模型,同時(shí)去掉最后池化層和全連接層,學(xué)習(xí)率設(shè)為10-4,每隔50輪下降10倍,通過(guò)對(duì)圖像進(jìn)行翻轉(zhuǎn)和邊界剪裁進(jìn)行數(shù)據(jù)增強(qiáng)操作,當(dāng)批次大小設(shè)為10,模型訓(xùn)練迭代50次時(shí),耗時(shí)大約6h。
2.4與先進(jìn)方法對(duì)比
結(jié)果對(duì)比:本文與4種當(dāng)前先進(jìn)的RGB-D圖像顯著性檢測(cè)模型CPFP[8],CTMF[26],TAN[9]和BBSNet[12]進(jìn)行了實(shí)驗(yàn)對(duì)比,圖4和圖5分別展示它們?cè)谄骄^對(duì)誤差(MAE)和E值上的比較,其中,MAE越小,E值越大表明模型性能越好,明顯看出,本文模型在不同數(shù)據(jù)集上都取得了最高的E值和最小的MAE。對(duì)于F值和S值對(duì)比結(jié)果如表1所示,本文模型比 CPFP,CTMF,TAN和BBSNet的F值在NJU2000數(shù)據(jù)集上分別提高了3.34%、6.04%、6.16%和 0.45%,在其他數(shù)據(jù)集上也有不同程度的提高;對(duì)于S值,本文模型也高于其他模型。綜合以上數(shù)據(jù),可得本文模型顯著性檢測(cè)效果良好,評(píng)價(jià)指標(biāo)整體結(jié)果優(yōu)于其他模型,具有一定競(jìng)爭(zhēng)力。
可視化對(duì)比:如圖6所示,展示不同影響檢測(cè)結(jié)果情況下各模型輸出的顯著圖。圖6中第1行圖像是在普通背景下檢測(cè)單目標(biāo)物體,可以看出本文模型識(shí)別出的物體邊緣更清晰;第2行圖像是在受光線干擾情況下識(shí)別物體,光線反射易造成圖像原本顏色或形狀改變,本文模型能夠有效克服其帶來(lái)的影響,更好識(shí)別目標(biāo)物體;第3行圖像是在復(fù)雜場(chǎng)景下對(duì)多物體進(jìn)行識(shí)別,本文模型能夠清晰地檢測(cè)出所有物體;第4行圖像是在低對(duì)比度場(chǎng)景中識(shí)別物體,本文模型充分利用深度圖像的有用特征,取得了可靠結(jié)果。
2.5特征分組監(jiān)督融合模塊實(shí)驗(yàn)對(duì)比
對(duì)特征分組融合監(jiān)督模塊的有效性進(jìn)行檢測(cè),視覺(jué)結(jié)果如圖7所示,其中,圖7(a)是僅使用簡(jiǎn)單卷積操作的NFGM(no feature grouped supervision module,NFGM)算法各層所得,圖7(b)是本文模型在FGM融合的各階段生成的顯著圖S1~S4,可以明顯看到目標(biāo)物體邊緣逐層開(kāi)始清晰,最后得到了效果良好的結(jié)果圖;NFGM算法各層輸出的顯著圖與本文模型輸出的顯著圖對(duì)應(yīng)相比,目標(biāo)物體輪廓模糊,有明顯的冗余特征,再結(jié)合表2數(shù)據(jù),本文模型各項(xiàng)評(píng)價(jià)指標(biāo)都遠(yuǎn)優(yōu)于NFGM算法。故從視覺(jué)和評(píng)價(jià)標(biāo)準(zhǔn)兩方面來(lái)看,本文引入的特征分組監(jiān)督融合模塊高質(zhì)量完成了特征的融合,提升了模型的顯著性檢測(cè)性能。
2.6深度改進(jìn)模塊實(shí)驗(yàn)對(duì)比
對(duì)深度改進(jìn)模塊有效性進(jìn)行實(shí)驗(yàn)對(duì)比,實(shí)驗(yàn)結(jié)果如表3所示,其中,NDCM(no depth change module,NDCM)是未對(duì)深度圖像做增強(qiáng)處理的算法,可以看出本文算法各性能指標(biāo)均優(yōu)于NDCM算法,表明深度信息能夠顯著提升模型的性能,帶來(lái)很多增益,為目標(biāo)檢測(cè)提供空間信息細(xì)節(jié)上的指導(dǎo)。
3結(jié)語(yǔ)
本文基于卷積神經(jīng)網(wǎng)絡(luò),提出了一種多分支主干監(jiān)督網(wǎng)絡(luò)框架,引入深度改進(jìn)模塊和特征分組監(jiān)督融合模塊,以從高到低迭代優(yōu)化的方式輸出顯著性預(yù)測(cè)結(jié)果。本文模型在4個(gè)具有代表性的數(shù)據(jù)集上均達(dá)到了良好的效果,具有較強(qiáng)的魯棒性。在未來(lái)工作中,可以開(kāi)發(fā)一種端到端的框架,實(shí)現(xiàn)深度模塊改進(jìn)與多模態(tài)特征融合同步完成,加強(qiáng)關(guān)聯(lián)性研究。
參 考 文 獻(xiàn):
[1]SU W, WANG Z F. Widening Residual Refine Edge Reserved Neural Network for Semantic Segmentation[J]. Multimedia Tools and Applications, 2019, 78(13):18229.
[2]陳宇,周雨佳,丁輝. 一種XNet-CNN糖尿病視網(wǎng)膜圖像分類方法[J].哈爾濱理工大學(xué)學(xué)報(bào),2020,25(1):73.CHEN Yu,ZHOU Yujia,DING Hui. An XNet-CNN Diabetic Retinal Image Classification Method[J].Journal of Harbin University of Science and Technology,2020,25(1):73.
[3]GUO Chenlei , ZHANG Liming. A Novel Multiresolution Spatiotemporal Saliency Detection Model and its Applicationsin Image and Video Compression[J]. IEEE Transaction son Image Processing, 2010, 19(1): 185.
[4]朱素霞,祖宏亮,孫廣路. 一種基于空間信息的FSICM圖像分割算法[J].哈爾濱理工大學(xué)學(xué)報(bào),2020,25(4): 101.ZHU Suxia, ZU Hongliang, SUN Guanglu. Image Segmentation Algorithm Named FSICM Based on Spatial Information[J]. Journal of Harbin University of Science and Technology,2020,25(4): 101.
[5]FU K, FAN D P, JI G P , et al. JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Seattle, WA, USA, 2020,11(3):404.
[6]CHEN Z, CONG R, XU Q, et al. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection[C]// IEEE Transactions on Image Processing, 2020,24(8):3736.
[7]FAN D P, LIN Z, ZHANG Z, et al. Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks[C]//IEEE Transactions on Neural Networks and Learning Systems, 2020,11(36):325.
[8]ZHAN J, CAO Y, FAN D, et al. Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, 2019,29(3):2925.
[9]CHEN H, LI Y. Three-Stream Attention-Aware Network for RGB-D Salient Object Detection[J]. IEEE Transactions on Image Processing, 2019, 28(6): 2825.
[10]ZHU C, CAI X, HUANG K, et al. PDNet: Prior-Model Guided Depth-Enhanced Network for Salient Object Detection[C]//2019 IEEE International Conference on Multimedia and Expo (ICME). Shanghai China, 2019,36(7):199.
[11]CHEN S, TAN X, WANG B, et al.Reverse Attention-Based Residual Network for Salient Object Detection[J]. IEEE Transactions on Image Processing, 2020, 29 (1): 3763.
[12]FAN D P, ZHAI Y J, BORJI A, et al. BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network[J]. Computer Vision-ECCV,2020, 12357(1): 275.
[13]LIU S, HUANG D, WANG Y. Receptive Field Block Net for Accurate and Fast Object Detection[C]//ECCV. 2018,33(1):404.
[14]LI C. ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection[J]. IEEE Transactions on Cybernetics, 2021, 51(1): 88.
[15]孫廣路,吳猛,邱景,等.針對(duì)長(zhǎng)視頻問(wèn)答的深度記憶融合模型[J].哈爾濱理工大學(xué)學(xué)報(bào),2021,26(1):1.SUN Guanglu,WU Meng,QIU Jing, et al.Deep Memory Fusion Model for Long Video Question Answering[J].Journal of Harbin University of Science and Technology,2021,26(1):1.
[16]劉政怡,段群濤,石松,等.基于多模態(tài)特征融合監(jiān)督的RGB-D圖像顯著性檢測(cè)[J].電子與信息學(xué)報(bào),2020,42(4):997.LIU Zhengyi, DUAN Quntao,SHI Song, et al. RGB-D Image Saliency Detection Based on Multi-modal Feature-fused Supervision[J].Journal of Electronics and Information Technology, 2020,42(4):997.
[17]JU R, GE L, GENG W, et al.Depth Saliency Based on Anisotropic Center-surround Difference[C]// 2014 IEEE International Conference on Image Processing (ICIP). Paris, France, 2014,7025(22):1115.
[18]PENG H W, LI B, XIONG W H, et al. RGB-D Salient Object Detection: A Benchmark and Algorithms[C]// Computer Vision-ECCV,2014,45(33):92.
[19]LI G, ZHU C. A Three-Pathway Psychobiological Framework of Salient Object Detection Using Stereoscopic Technology[C]// 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). Venice, Italy, 2017,18(9):783.
[20]LI N, YE J, JI Y, et al. Saliency Detection on Light Field[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 8(39):1605.
[21]KRAHEN P. Saliency Filters: Contrast Based Filtering for Salient Region Detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2012,3(46):733.
[22]ACHANTA R, HEMAMI,S, Estrada F, et al. Frequency-tuned Salient Region Detection[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2009,4(13):1597.
[23]BORJI A, CHENG M, JIANG H, et al. Salient Object Detection: A Benchmark[J]. IEEE Transactions on Image Processing, 2015, 12(24): 5706.
[24]FAN D, CHENG M, LIU Y, et al. Structure-Measure: A New Way to Evaluate Foreground Maps[C]//2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy, 2017,34(6):4598.
[25]STEINER B, DeEVITO Z, CHINTALA S, et al. PyTorch: An Imperative Style, Highperformance Deep Learning Library[C]// NIPS, 2019,48(3):8024.
[26]HAN J, CHEN H, LIU N, et al.CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion[J]. IEEE Transactions on Cybernetics, 2018, 11 (48): 3171.
(編輯:溫澤宇)