楊愛萍,程思萌,王金斌,宋尚陽,丁學文
基于邊界檢測和骨骼提取的顯著性檢測網(wǎng)絡
楊愛萍1,程思萌1,王金斌1,宋尚陽1,丁學文2
(1. 天津大學電氣自動化與信息工程學院,天津 300072;2. 天津職業(yè)技術師范大學電子工程學院,天津 300222)
目前一些方法通過多任務聯(lián)合實現(xiàn)顯著性檢測,在一定程度上提升了檢測精度,但仍存在誤檢和漏檢問題,其原因在于各任務優(yōu)化目標不同且特征域差異較大,導致網(wǎng)絡對顯著性、物體邊界等特征辨識能力不足.基于此,借助邊界檢測和骨骼提取提出一種多任務輔助的顯著性檢測網(wǎng)絡,其包括特征提取子網(wǎng)絡、邊界檢測子網(wǎng)絡、骨骼提取子網(wǎng)絡以及顯著性填充子網(wǎng)絡.其中,特征提取子網(wǎng)絡利用ResNet101預訓練模型提取圖像的多尺度特征;邊界檢測子網(wǎng)絡選擇前3層特征進行融合,可完整保留顯著性目標的邊界信息;骨骼提取子網(wǎng)絡選擇后兩層特征進行融合,可準確定位顯著性目標的中心位置;所提方法基于邊界檢測數(shù)據(jù)集和骨骼提取數(shù)據(jù)集分別對兩個子網(wǎng)絡進行訓練,保留最好的邊界檢測模型和骨骼提取模型,作為預訓練模型輔助顯著性檢測任務.為降低網(wǎng)絡優(yōu)化目標與特征域之間的差異,設計了顯著性填充子網(wǎng)絡將提取的邊界特征和骨骼特征進行融合和非線性映射.在4種數(shù)據(jù)集上的實驗結果表明,所提方法能有效恢復缺失的顯著性區(qū)域,優(yōu)于其他顯著性目標檢測方法.
邊界檢測;骨骼提?。欢嗳蝿?;顯著性檢測網(wǎng)絡
顯著性檢測可通過計算機模擬人類視覺系統(tǒng),快速分析輸入圖像并保留最引人注意的區(qū)域.廣泛應用在圖像檢索[1]、目標檢測[2]、行為識別[3]、圖像分?割[4]和目標識別[5]等計算機視覺任務.
現(xiàn)有顯著性檢測方法主要采用“主體檢測為主,邊界細化為輔”的思想,通過不同任務分別檢測顯著性目標區(qū)域和顯著性目標邊界.根據(jù)邊界檢測方式可分為基于單數(shù)據(jù)集的多任務檢測網(wǎng)絡和基于多數(shù)據(jù)集的多任務檢測網(wǎng)絡.
基于單數(shù)據(jù)集的多任務檢測網(wǎng)絡一般設計兩個并行網(wǎng)絡分別檢測顯著性目標區(qū)域和顯著性目標邊界,兩個網(wǎng)絡均使用DUTS數(shù)據(jù)集[6]進行監(jiān)督學習.Wei等[7]通過數(shù)學運算將輸入圖像分解為邊界圖和目標區(qū)域圖,并設計兩個子網(wǎng)絡分別學習.Song?等[8]提出一種多層次邊界細化的顯著性檢測網(wǎng)絡,先獲得粗糙的顯著性預測圖,后對顯著性邊界進行細化.該類方法由于利用邊界檢測算子在計算過程中存在誤差,導致提取的邊界不完整.因此,一些學者使用多個數(shù)據(jù)集對網(wǎng)絡進行監(jiān)督訓練來提升顯著性判別能力和邊界提取能力.Wu等[9]采用邊界檢測和顯著性檢測多任務聯(lián)合訓練方式,提升網(wǎng)絡的特征提取能力;然而該方法沒有考慮多個檢測任務之間的協(xié)作問題,導致提取的目標區(qū)域和邊界特征不完整.在此基礎上,Liu等[10]增加了骨架檢測任務,通過對3個任務聯(lián)合訓練,提升網(wǎng)絡邊界檢測能力和中心定位能力.該方法使用權重共享策略交換多任務信息,忽略了不同檢測任務之間的差異,導致預測圖不完整.
由以上分析可知,目前多任務檢測方法大都通過特征堆疊或權重共享等方式交換信息,而未考慮不同任務之間特征域差異性,導致特征提取不完整.與現(xiàn)有方法不同,本文采用“分治”思想,提出了一種基于多數(shù)據(jù)集多任務輔助的顯著性檢測網(wǎng)絡.具體地,通過對邊界檢測任務和骨骼提取任務獨立訓練,保留最優(yōu)的邊界檢測模型和骨骼提取模型,并將它們作為預訓練模型輔助顯著性目標檢測任務,分別提取邊界特征和骨骼特征來準確定位顯著性目標的邊界位置和中心位置,緩解因不同任務目標之間特征域的差異性導致的特征提取不完整問題.最后,將提取的邊界特征和骨骼特征進行融合和非線性映射得到完整的顯著性圖.
本文提出了一種基于多任務輔助的顯著性檢測網(wǎng)絡,其整體結構如圖1所示.該網(wǎng)絡由特征提取子網(wǎng)絡、邊界檢測子網(wǎng)絡、骨骼提取子網(wǎng)絡和顯著性填充子網(wǎng)絡組成.其中,特征提取子網(wǎng)絡用于提取輸入圖像的多尺度特征,由5個殘差卷積塊級聯(lián)而成,可表示為RB1-RB5.邊界檢測子網(wǎng)絡通過前3層卷積RB1、RB2、RB3提取顯著性目標的輪廓,得到邊界信息;骨骼提取子網(wǎng)絡利用后兩層卷積RB4、RB5提取顯著性目標的骨骼,定位中心位置;為了提升網(wǎng)絡的辨識能力,利用金字塔卷積模塊增大特征的感受野,并設計特征增強模塊對特征進行自適應加權.最后,顯著性填充子網(wǎng)絡根據(jù)顯著性目標的邊界信息和中心位置進行填充,得到顯著性預測圖.
圖1?網(wǎng)絡整體結構
1.1.1?金字塔卷積模塊
為了增強網(wǎng)絡的全局感知能力,受金字塔網(wǎng)絡結構[12-13]啟發(fā),本文設計了金字塔卷積模塊,將多種感受野下的特征進行融合.金字塔卷積模塊結構如圖2所示.
圖2?金字塔卷積模塊結構
1.1.2?特征增強模塊
為提升特征表達能力,篩選有用特征、抑制無用特征,設計了特征增強模塊,利用通道注意力機制[14]和空間注意力機制[15],從通道維度和空間維度對特征進行篩選和增強,特征增強模塊結構如圖3所示.
圖3?特征增強模塊
分階段對不同任務進行監(jiān)督學習.第1階段為邊界檢測任務,選擇二進制交叉熵函數(shù)[16]作為損失函數(shù),即
第2階段為骨骼提取任務,其損失函數(shù)為
第3階段為顯著性目標檢測任務,其損失函數(shù)為
選擇BSDS500[17]數(shù)據(jù)集作為邊界檢測任務的訓練集,該數(shù)據(jù)集包含200張圖像,每張圖像對應3~4張真值圖,隨機選取一張用于網(wǎng)絡訓練;選擇SK-LARGE[18]數(shù)據(jù)集為骨骼提取任務的訓練集,包含746張圖像;選擇DUTS-TR[6]數(shù)據(jù)集為顯著性檢測任務的訓練集,包含10553張圖像;DUTS-TE[6]、ECSSD[19]、HKU-IS[20]和PASCAL-S[21]作為測試集.
為了驗證本文方法的有效性,從客觀指標和主觀效果兩方面與現(xiàn)有顯著性目標檢測方法進行對比,對比方法包括PAGR[22]、PiCANet[23]、MLM[9]、ICNet[24]、BASNet[25]、AFNet[26]、PAGE[27]、CPD[28]、ITSD[29]、CANet[30]、CAGNet[31]、HERNet[8]、AMPNet[32].
圖4為主觀效果對比結果.選取了一些代表性場景圖像,如復雜場景的顯著性目標(第1行)、前景和背景相似的顯著性目標(第2行)、小型顯著性目標(第3行)、規(guī)則顯著性目標(第4行)、被遮擋的顯著性目標(第5行)、多個顯著性目標(第6行).可以看出,所提方法取得了理想的檢測結果,尤其在小型顯著目標圖像中,大部分方法漏檢了遠處的鴨子(第3行);在顯著目標被覆蓋和復雜場景圖像中,多數(shù)方法均存在誤檢問題,將狗上方的黃色字母(第1行)和覆蓋在鳥身上的葉子(第5行)判定為顯著目標;在長條狀顯著目標圖像中,本文方法得到了更精確的顯著目標邊界.由圖4可以看出,本文方法在多個場景下,主觀效果都優(yōu)于其他多任務聯(lián)合方法(MLM[9]).由此可以得知,本文提出的基于邊界檢測和骨骼提取的多任務輔助方法能有效恢復缺失的顯著性區(qū)域,解決檢測結果不完整的問題.
表1?不同顯著性目標檢測方法的客觀指標
Tab.1?Objective metrics of different saliency detection methods
圖4?所提方法和其他方法的主觀比較
表2比較了本文方法與其他方法的平均速度.可以看出,本文方法的運行速度優(yōu)于大部分顯著性檢測方法;相比于ITSD[29]和CPD[28]兩個快速顯著性檢測網(wǎng)絡,本文方法也有一定競爭力.
表2 本文所提方法與其他方法在平均速度上的比較
Tab.2 Comparison of the proposed method with other methods in terms of average speed
為了驗證所提方法中金字塔卷積模塊(PCM)的作用,進行了消融實驗.共包括如4個實驗:實驗1為淺層特征和深層特征均不使用PCM(without PCM,WPCM);實驗2為僅淺層特征使用PCM (shallow PCM,SPCM);實驗3為僅深層特征用PCM(deep PCM,DPCM);實驗4為淺層特征和深層特征均用PCM(both PCM,BPCM).
為了驗證所提方法中特征增強模塊(FEM)的有效性,在4個數(shù)據(jù)集上進行消融實驗,分別為:淺層特征和深層特征均不使用FEM(without FEM,WFEM)(實驗1);僅淺層特征使用FEM(shallow PCM,SFEM)(實驗2);僅深層特征用FEM(deep FEM,DFEM)(實驗3);淺層特征和深層特征均用FEM(both FEM,BFEM)(實驗4);不同模型是否使用FEM的值和MAE的結果如表5所示.
表3?消融實驗結果
Tab.3?Ablation results
表4?金字塔卷積模塊的消融實驗結果
Tab.4?Ablation results on the pyramid convolutional module
表5?特征增強模塊的消融實驗結果
Tab.5?Ablation results on the feature enhancement module
本文提出了一種基于邊界檢測和骨骼提取的顯著性檢測網(wǎng)絡,通過對兩個任務分別訓練,輔助顯著性檢測網(wǎng)絡生成完整的顯著性圖,可有效解決檢測結果中部分顯著區(qū)域漏檢和誤檢的問題.具體來說,本文將輸入圖像分解,利用邊界檢測子網(wǎng)絡和骨骼提取子網(wǎng)絡分別獲得顯著性目標邊界特征和骨骼特征,可準確地定位顯著性目標的邊界位置和中心位置;為了降低多任務之間的差異,設計顯著性填充子網(wǎng)絡,以骨骼特征為中心、邊界特征為邊界,對顯著性目標區(qū)域進行填充,獲得完整的顯著性圖.此外,文中還設計了金字塔卷積模塊和特征增強模塊對邊界特征和骨骼特征進行篩選和增強,提升網(wǎng)絡表達能力.實驗結果表明,本文方法能在降低特征提取難度的同時,完整且準確地檢測出顯著性目標.
[1] Babenko A,Lempitsky V. Aggregating local deep features for image retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago,Chile,2015:1269-1277.
[2] 龐彥偉,余?珂,孫漢卿,等. 基于逐級信息恢復網(wǎng)絡的實時目標檢測算法[J]. 天津大學學報(自然科學與工程技術版),2022,55(5):471-479.
Pan Yanwei,Yu Ke,Sun Hanqing,et al. Hierarchical information recovery network for real-time object detection[J]. Journal of Tianjin University(Science and Technology),2022,55(5):471-479(in Chinese).
[3] Abdulmunem A,Lai Y K,Sun X. Saliency guided local and global descriptors for effective action recognition[J]. Computational Visual Media,2016,2(1):97-106.
[4] Zhou S P,Wang J J,Zhang S,et al. Active contour model based on local and global intensity information for medical image segmentation[J]. Neurocomputing,2016,186:107-118.
[5] Cao X C,Tao Z Q,Zhang B,et al. Self-adaptively weighted co-saliency detection via rank constraint[J]. IEEE Transactions on Image Processing,2014,23(9):4175-4186.
[6] Wang L J,Lu H C,Wang Y F,et al. Learning to detect salient objects with image-level supervision[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,USA,2017:136-145.
[7] Wei J,Wang S H,Wu Z,et al. Label decoupling framework for salient object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle,USA,2020:13025-13034.
[8] Song D W,Dong Y S,Li X L. Hierarchical edge refinement network for saliency detection[J]. IEEE Transactions on Image Processing,2021,30:7567-7577.
[9] Wu R M,F(xiàn)eng M Y,Guan W L,et al. A mutual learning method for salient object detection with intertwined multi-supervision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach,USA,2019:8150-8159.
[10] Liu J J,Hou Q B,Cheng M M. Dynamic feature integration for simultaneous detection of salient object,edge,and skeleton[J]. IEEE Transactions on Image Processing,2020,29:8652-8667.
[11] He K M,Zhang X Y,Ren S Q,et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,USA,2016:770-778.
[12] Chen L C,Papandreou G,Kokkinos I,et al. Dee-plab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[13] He K M,Zhang X Y,Ren S Q,et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[14] Hu J,Shen L,Sun G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,USA,2018:7132-7141.
[15] Peng C,Zhang X Y,Yu G,et al. Large kernel matters—Improve semantic segmentation by global convolutional network[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,USA,2017:4353-4361.
[16] De Boer P T,Kroese D P,Mannor S,et al. A tutorial on the cross-entropy method[J]. Annals of Operations Research,2005,134(1):19-67.
[17] Arbelaez P,Maire M,F(xiàn)owlkes C,et al. Contour detection and hierarchical image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(5):898-916.
[18] Shen W,Zhao K,Jiang Y,et al. Object skeleton extraction in natural images by fusing scale-associated deep side outputs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,USA,2016:222-230.
[19] Yan Q,Xu L,Shi J D,et al. Hierarchical saliency detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland,USA,2013:1155-1162.
[20] Li G,Yu Y. Visual saliency based on multiscale deep features[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston,USA,2015:5455-5463.
[21] Li Y,Hou X D,Koch C,et al. The secrets of salient object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus,USA,2014:280-287.
[22] Zhang X W,Wang T T,Qi J Q,et al. Progressive attention guided recurrent network for salient object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,USA,2018:714-722.
[23] Liu N,Han J W,Yang M H. Picanet:Learning pixel-wise contextual attention for saliency detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,USA,2018:3089-3098.
[24] Wang W G,Shen J B,Cheng M M,et al. An iterative and cooperative top-down and bottom-up inference network for salient object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach,USA,2019:5968-5977.
[25] Qin X B,Zhang Z C,Huang C Y,et al. Basnet:Boundary-aware salient object detection[C]// Proceed-ings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach,USA,2019:7479-7489.
[26] Feng M Y,Lu H C,Ding E. Attentive feedback network for boundary-aware salient object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach,USA,2019:1623-1632.
[27] Wang W G,Zhao S Y,Shen J B,et al. Salient object detection with pyramid attention and salient edges[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach,USA,2019:1448-1457.
[28] Wu Z,Su L,Huang Q M. Cascaded partial decoder for fast and accurate salient object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach,USA,2019:3907-3916.
[29] Zhou H J,Xie X H,Lai J H,et al. Interactive two-stream decoder for accurate and fast saliency detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle,USA,2020:9141-9150.
[30] Li J X,Pan Z F,Liu Q S,et al. Complementarity-aware attention network for salient object detection[J]. IEEE Transactions on Cybernetics,2020,52(2):873-887.
[31] Mohammadi S,Noori M,Bahri A,et al. CAGNet:Content-aware guidance for salient object detection[J]. Pattern Recognition,2020,103:107303.
[32] Sun L N,Chen Z X,Wu Q M J,et al. AMPNet:Average- and max-pool networks for salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology,2021,31(11):4321-4333.
[33] Achanta R,Hemami S,Estrada F,et al. Frequency-tuned salient region detection[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami,USA,2009:1597-1604.
[34] Fan D P,Cheng M M,Liu Y,et al. Structure-measure:A new way to evaluate foreground maps[C]// Proceedings of the IEEE International Conference on Computer Vision. Venice,Italy,2017:4548-4557.
[35] DeepSaliency:Muilt-task deep neural network model for salient object detection[J]. IEEE Transactions on Image Processing,2016,25(8):3919-3930.
Saliency Detection Network Based on Edge Detection and Skeleton Extraction
Yang Aiping1,ChengSimeng1,Wang Jinbin1,Song Shangyang1,Ding Xuewen2
(1. School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China;2. School of Electronic Engineering,Tianjin University of Technology and Education,Tianjin 300222,China)
Recently,considerable progress has been made in salient object detection based on joint multitask learning. However,false detection and leak detection persist owing to differences in optimization objectives and feature domains among different tasks. Therefore,current networks are incapable of identifying features such as saliency and object boundaries. Herein,we proposed an assisted multitask saliency detection network based on edge detection and skeleton extraction,comprising a feature extraction subnetwork,edge detection subnetwork,skeleton extraction subnetwork,and saliency filling subnetwork. The feature extraction subnetwork extracts multilevel features of images using ResNet101 pretrained model. The edge detection subnetwork selects the first three layers for feature fusion to retain the salient edge completely. The skeleton extraction subnetwork selects the last two layers for feature fusion to locate the center of the salient object accurately. Unlike the current networks,we train two subnetworks on edge detection dataset and skelecton extraction dataset to preserve the best models separately,which are used as pretrained models to assist with saliency detection tasks. Furthermore,to reduce the discrepancy between optimization objects and feature domains,the saliency filling subnetwork is designed to make the fusion and non-linear mapping for extracted edge and skeletal features. Experimental results for four datasets show that the proposed method can not only restore the missing saliency regions effectively but also outperform other methods.
edge detection;skeleton extraction;multitask;saliency detection network
10.11784/tdxbz202204052
TP391
A
0493-2137(2023)08-0823-08
2022-04-29;
2022-12-16.
楊愛萍(1977—??),女,博士,副教授.Email:m_bigm@tju.edu.cn
楊愛萍,yangaiping@tju.edu.cn.
國家自然科學基金資助項目(62071323,61632018,61771329);天津市科技計劃資助項目(20YDTPJC01110).
the National Natural Science Foundation of China(No. 62071323,No. 61632018,No. 61771329),Tianjin Science and Technology Planning Project(No. 20YDTPJC01110).
(責任編輯:孫立華)