国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡

?

基于自監(jiān)督學(xué)習(xí)的番茄植株圖像深度估計方法

2019-03-05 04:05:02周云成許童羽鄧寒冰
農(nóng)業(yè)工程學(xué)報 2019年24期
關(guān)鍵詞:雙目番茄卷積

周云成,許童羽,鄧寒冰,苗 騰,吳 瓊

基于自監(jiān)督學(xué)習(xí)的番茄植株圖像深度估計方法

周云成,許童羽,鄧寒冰,苗 騰,吳 瓊

(沈陽農(nóng)業(yè)大學(xué)信息與電氣工程學(xué)院,沈陽 110866)

深度估計是智能農(nóng)機視覺系統(tǒng)實現(xiàn)三維場景重建和目標定位的關(guān)鍵。該文提出一種基于自監(jiān)督學(xué)習(xí)的番茄植株圖像深度估計網(wǎng)絡(luò)模型,該模型直接應(yīng)用雙目圖像作為輸入來估計每個像素的深度。設(shè)計了3種面向通道分組卷積模塊,并利用其構(gòu)建卷積自編碼器作為深度估計網(wǎng)絡(luò)的主體結(jié)構(gòu)。針對手工特征衡量2幅圖像相似度不足的問題,引入卷積特征近似性損失作為損失函數(shù)的組成部分。結(jié)果表明:基于分組卷積模塊的卷積自編碼器能夠有效提高深度估計網(wǎng)絡(luò)的視差圖精度;卷積特征近似性損失函數(shù)對提高番茄植株圖像深度估計的精度具有顯著作用,精度隨著參與損失函數(shù)計算的卷積模塊層數(shù)的增加而升高,但超過4層后,其對精度的進一步提升作用不再明顯;當雙目圖像采樣距離在9.0 m以內(nèi)時,該文方法所估計的棋盤格角點距離均方根誤差和平均絕對誤差分別小于2.5和1.8 cm,在3.0 m以內(nèi)時,則分別小于0.7和0.5 cm,模型計算速度為28.0幀/s,與已有研究相比,2種誤差分別降低了33.1%和35.6%,計算速度提高了52.2%。該研究可為智能農(nóng)機視覺系統(tǒng)設(shè)計提供參考。

圖像處理;卷積神經(jīng)網(wǎng)絡(luò);算法;自監(jiān)督學(xué)習(xí);深度估計;視差;深度學(xué)習(xí);番茄

0 引 言

視覺系統(tǒng)是智能農(nóng)機進行環(huán)境感知的重要部件[1]。在日光溫室或田間環(huán)境下,采用自主工作模式的智能農(nóng)機需要規(guī)劃行進路線[2],規(guī)避障礙物[3],同時在果實采摘[4]、對靶施藥[5]等自動化生產(chǎn)過程中則需要識別并定位作業(yè)目標,這些都要求視覺系統(tǒng)具有目標定位、三維場景重建等功能,而深度信息的獲取是實現(xiàn)這些功能的關(guān)鍵。

基于圖像特征的立體視覺匹配法和以激光雷達(light detection and ranging,LiDAR)、Kinect為代表的深度傳感器等常被用于植株的深度信息獲取。立體視覺匹配算法用各像素點局部區(qū)域特征,在能量函數(shù)的約束下進行雙目圖像特征點匹配,實現(xiàn)深度信息恢復(fù)。翟志強等[6]以灰度圖像的Rank變換結(jié)果作為立體匹配基元來實現(xiàn)農(nóng)田場景的三維重建,其算法的平均誤匹配率為15.45%。朱镕杰等[7]對棉株雙目圖像進行背景分割,通過尺度不變特征轉(zhuǎn)換算子提取棉花特征點,并通過最優(yōu)節(jié)點優(yōu)先算法進行匹配,獲取棉花點云三維坐標。由于田間植株圖像顏色、紋理均一,傳統(tǒng)算子提取的特征可區(qū)分性差,特征點誤匹配現(xiàn)象嚴重。H?mmerle等[8]使用LiDAR獲取作物表面深度信息進行作物表面建模。程曼等[9]用LiDAR掃描花生冠層,獲取三維點云數(shù)據(jù),通過多項式曲線擬合算法獲取冠層高度特性。LiDAR可快速獲取高精度深度信息,但設(shè)備價格昂貴[3],且無法直接獲取RGB(紅綠藍)圖像進行目標識別。肖珂等[2]利用Kinect提供的RGB圖像識別葉墻區(qū)域,并與設(shè)備的深度圖進行匹配,測算葉墻區(qū)域平均距離,用于規(guī)劃行進路線。Kinect可同時獲取RGB圖像和像素對齊的深度圖,但該傳感器基于光飛行技術(shù),易受日光干擾、噪聲大、視野小,難以在田間復(fù)雜工況下穩(wěn)定工作。數(shù)碼相機技術(shù)成熟、穩(wěn)定性高、價格低廉,如果能夠在基于圖像的深度估計方法上取得進展,其將是理想的智能農(nóng)機視覺感知部件。近幾年,卷積神經(jīng)網(wǎng)絡(luò)(convolutional neural network,CNN)在目標識別[10]、語義分割[11]等多個計算機視覺領(lǐng)域取得了突破,在深度估計方面也逐漸得到應(yīng)用[12]。Mayer等[13]通過監(jiān)督學(xué)習(xí)的CNN模型實現(xiàn)圖像每個像素的深度預(yù)測。但監(jiān)督學(xué)習(xí)方法存在的主要問題是對圖像樣本進行逐像素的深度標注十分困難[14],雖然Kinect等深度傳感器能夠同時獲得RGB圖像和深度,但深度數(shù)據(jù)中存在噪聲干擾,影響模型訓(xùn)練效果[15]。Godard等[16]提出一種基于左右目一致性約束的自監(jiān)督深度估計模型,在KITTI(Karlsruhe institute of technology and Toyota technological institute at Chicago,卡爾斯魯厄理工學(xué)院和芝加哥豐田技術(shù)研究院)數(shù)據(jù)集[17]上取得了良好的效果,該方法是目前精度最高的自監(jiān)督深度估計方法[18]。與KITTI等數(shù)據(jù)集相比,田間植株圖像變異性小,CNN用于該類圖像深度估計的方法及適用性有待進一步探討。

鑒于此,本文針對智能農(nóng)機視覺系統(tǒng)對深度信息獲取的實際需求及問題,以番茄植株圖像深度估計為例,借鑒已有研究成果,提出一種基于自監(jiān)督學(xué)習(xí)的番茄植株圖像深度估計網(wǎng)絡(luò)模型,該模型直接應(yīng)用雙目圖像作為輸入來估計每個像素的深度。利用卷積自編碼器為模型設(shè)計網(wǎng)絡(luò)主體結(jié)構(gòu)。針對現(xiàn)有損失函數(shù)存在的不足,提出引入卷積特征近似性損失作為網(wǎng)絡(luò)損失函數(shù)的一部分。以重構(gòu)圖像相似度和棋盤格角點估計距離誤差等為判據(jù),驗證本文方法在番茄植株圖像深度估計上的有效性,以期為智能農(nóng)機視覺系統(tǒng)設(shè)計提供參考。

1 番茄植株雙目圖像數(shù)據(jù)集構(gòu)建

1.1 圖像采集設(shè)備及雙目相機標定

1.2 試驗條件及數(shù)據(jù)集構(gòu)建

番茄植株雙目圖像數(shù)據(jù)于2018年5月采集自沈陽農(nóng)業(yè)大學(xué)試驗基地某遼沈IV型節(jié)能日光溫室(長60 m,寬10 m),番茄品種為“瑞特粉娜”,基質(zhì)栽培、吊蔓生長,株距0.3 m,行距1.0 m,此時番茄處于結(jié)果期,株高約2.7 m。分別在晴朗、多云、陰天天氣的上午9:00-12:00進行圖像采集。首先在番茄行間采集兩側(cè)植株圖像,受行距約束,相機離株行的水平距離在0.5~1.0 m之間。同時在行間沿株行方向?qū)ο鄼C目視前方場景進行采樣。共采集番茄植株雙目圖像12 000對。基于相機內(nèi)、外參數(shù),采用Bouguet法[20],通過OpenCV編程對采集的番茄植株圖像進行極線校正,使雙目圖像的光軸平行、同一空間點在雙目圖像上的像素點行對齊。校正后的雙目圖像構(gòu)成番茄植株雙目圖像數(shù)據(jù)集,其中隨機選擇80%的圖像作為下文深度估計網(wǎng)絡(luò)的訓(xùn)練集樣本,其余20%作為測試集樣本,每個深度估計網(wǎng)絡(luò)重復(fù)試驗5次。采用相同的圖像采集與校正方法構(gòu)建含棋盤格標定板的植株雙目圖像數(shù)據(jù)集,圖像采集時在場景中放置單元格大小已知的棋盤格標定板,分別在相機鏡頭距離標定板支架0.5~3.0、3.0~6.0、6.0~9.0 m范圍內(nèi)對場景成像,并保證標定板均完整出現(xiàn)在雙目圖像中,共采集含標定板的雙目圖像1 500對,其中不同天氣條件和采樣距離下采集的圖像數(shù)量均等。

2 自監(jiān)督植株圖像深度估計方法

2.1 雙目圖像視差估計網(wǎng)絡(luò)模型

注:Il、Ir分別表示左、右目圖像;Dl、Dr分別表示左、右目視差圖;分別表示左、右目重構(gòu)圖像;S表示圖像采樣器。下同。

2.2 網(wǎng)絡(luò)主體結(jié)構(gòu)設(shè)計

本文采用卷積自編碼器(convolutional auto-encoder,CAE)作為DNN的結(jié)構(gòu)。CAE常被用于語義分割[11]、深度恢復(fù)[12-15]等任務(wù),在這些研究中,CAE由常規(guī)卷積層、池化層等堆疊而成,網(wǎng)絡(luò)參數(shù)多、計算量大。近期研究[21-22]多采用模塊化設(shè)計的卷積塊來構(gòu)建CNN網(wǎng)絡(luò)。周云成等[23]設(shè)計了一種稱為面向通道分組卷積(channel wise group convolutional,CWGC)模塊的結(jié)構(gòu),基于該結(jié)構(gòu)的CNN網(wǎng)絡(luò)在番茄器官分類和識別任務(wù)上都取得了較高的精度,與常規(guī)卷積相比,在寬度和深度相同的前提下,該結(jié)構(gòu)可有效降低網(wǎng)絡(luò)參數(shù)的數(shù)量。本文針對深度恢復(fù)任務(wù),在現(xiàn)有CWGC模塊基礎(chǔ)上引入上采樣和下采樣功能(圖2)。

圖2a中的CWGC模塊主要由4組相同的卷積組(convolution group)構(gòu)成,各卷積組分別對輸入進行特征提取,生成特征圖,然后在通道方向上合并特征圖,經(jīng)批標準化(batch normalization,BN)層和ELU(exponential linear unit,指數(shù)線性單元)[24]處理后作為CWGC模塊的輸出。卷積組的卷積層只和組內(nèi)的前后層連接,并使用1×1卷積(conv1×1)作為瓶頸層,以降低參數(shù)數(shù)量、加深網(wǎng)絡(luò)深度、提高語義特征提取能力。為CWGC模塊設(shè)計了3種類型的卷積組,圖2b為空間尺度不變卷積組,該卷積組在卷積操作時通過邊緣填充保持輸出與輸入的空間尺度(寬×高)相同;圖2c為下采樣卷積組,其中conv3×1和conv1×3在卷積過程中使用非等距滑動步長,使輸出特征圖的寬、高分別降為輸入的一半;圖2d為上采樣卷積組,在conv1×1后設(shè)置一個步長為2的轉(zhuǎn)置卷積層(deconv3×3),使輸出特征圖的空間尺度與輸入相比擴大1倍。分別使用3種類型的卷積組,可使CWGC的輸入、輸出在空間尺度不變、縮小和放大之間轉(zhuǎn)換。

注:conv1×1,等表示卷積核大小為1×1、通道數(shù)為的卷積層;s2,1等表示水平和垂直滑動步長分別為2和1;deconv表示轉(zhuǎn)置卷積。下同。

Note: conv1×1,and so on denote convolution with kernel size of 1×1 and channel number of; s2,1 indicates that the horizontal and vertical strides are 2 and 1 respectively; deconv means transpose convolution. Same as below.

圖2 卷積模塊

Fig.2 Convolutional block

注:CWGC-8D等表示CWGC模塊采用h=8的下采樣卷積組;CWGC-128U等表示CWGC模塊采用h=128的上采樣卷積組;Skip 1等表示跨越連接。下同。 Note: CWGC-8D, etc. denotes that CWGC block usesdown-sampling convolution group with h=8; CWGC-128U, etc. denotes that CWGC block adopts up-sampling convolution group with h=128; Skip 1, etc. denotes skip connection. Same as below.

2.3 深度估計網(wǎng)絡(luò)損失函數(shù)定義

損失函數(shù)的定義是實現(xiàn)自監(jiān)督圖像深度估計網(wǎng)絡(luò)優(yōu)化訓(xùn)練的關(guān)鍵,本文的損失函數(shù)定義如下

2.3.1 圖像重構(gòu)損失

2.3.2 圖像卷積特征近似性損失

1形式的圖像重構(gòu)損失,誤差梯度僅由對應(yīng)像素的光度差決定。SSIM指數(shù)由光度、對比度和結(jié)構(gòu)相似程度構(gòu)成。植株器官并非理想的朗伯體,其蠟質(zhì)化層會在一定程度上產(chǎn)生鏡面效應(yīng)。且由于左、右目相機的位置、姿態(tài)及自身物理特性的差異,同一空間點在雙目相機成像的光度值可能不同。即使DNN預(yù)測的視差圖是準確的,從一目圖像采樣重建的另一目圖像與原圖像也會有差異。因此僅采用式(3)的人工特征來衡量兩幅圖像的近似性是不足的。Zeiler等[25]研究表明,經(jīng)過訓(xùn)練的分類CNN網(wǎng)絡(luò)的低層卷積學(xué)習(xí)到的是顏色、邊緣、紋理等低級圖像特征。相比人工特征,CNN的卷積特征是通過大量樣本訓(xùn)練得到的,由于卷積核數(shù)量多,所提取的特征更為復(fù)雜多樣,且具有語義性,受環(huán)境差異影響小。因此,本文進一步采用經(jīng)良好訓(xùn)練的分類CNN網(wǎng)絡(luò)的低層卷積輸出特征的近似性來度量圖像的近似性。

為計算圖像的卷積特征張量,同樣采用CWGC模塊構(gòu)建一個分類CNN網(wǎng)絡(luò),命名為CWGCNet,結(jié)構(gòu)如圖 4,該網(wǎng)絡(luò)由常規(guī)卷積層、CWGC模塊、最大池化(max-pool)、丟棄(dropout)層、全局平均池化(global average pooling)和Softmax函數(shù)構(gòu)成,整個網(wǎng)絡(luò)具有35層卷積操作。

2.3.3 視差平滑損失

注:CWGC-16等表示CWGC模塊采用h=16的空間尺度不變卷積組;max-pool2×2表示池化窗口為2×2的最大池化層。

2.3.4 左右目視差一致性損失

2.4 可微分圖像采樣器

以最小化損失函數(shù)L為目標,通過梯度下降來調(diào)整深度估計網(wǎng)絡(luò)的權(quán)重參數(shù),實現(xiàn)模型的優(yōu)化。這要求構(gòu)成神經(jīng)網(wǎng)絡(luò)的每個模塊必須是可微分的,其中包括圖像采樣器。由于經(jīng)極線校正后同一空間點在雙目圖像上對應(yīng)像素行對齊,因此線性插值采樣可滿足圖像和視差圖的采樣重建,過程如圖5。

圖5 可微分圖像采樣過程示意

2.5 視差估計精度判據(jù)

DNN估計的視差圖越精確,由圖像采樣器基于視差圖采樣重建的圖像與目標圖像的相似度越高。因此本文首先采用與主觀評價法具有高度一致性的3種圖像相似度評價指標FSIM(feature similarity index,特征相似性指數(shù))[27]、IW-SSIM(information content weighted SSIM,信息內(nèi)容加權(quán)結(jié)構(gòu)相似性指數(shù))[28]和GSIM(gradient similarity index,梯度相似性指數(shù))[29]作為視差圖精度的間接評判指標,3種指標值均越大說明圖像相似度越高,視差圖越精確。

3 深度估計網(wǎng)絡(luò)模型訓(xùn)練與測試

3.1 深度估計網(wǎng)絡(luò)總體結(jié)構(gòu)及實現(xiàn)

3.2 CWGCNet的訓(xùn)練

采用ImageNet1000[31]的訓(xùn)練集訓(xùn)練CWGCNet分類網(wǎng)絡(luò),訓(xùn)練方法同文獻[32],采用小批量梯度下降法進行訓(xùn)練,通過圖像隨機裁剪法進行樣本增廣。在ImageNet1000的測試集上進行分類測試,并與AlexNet[33]和VGG-16[32]2種典型分類CNN網(wǎng)絡(luò)進行top-1、top-5錯誤率(指網(wǎng)絡(luò)輸出的1個或5個最高概率類型不包括實際類型的樣本數(shù)占測試樣本總數(shù)的比例)及權(quán)重參數(shù)數(shù)量(由網(wǎng)絡(luò)中所有卷積核和全連接層的權(quán)重參數(shù)的數(shù)量累加得到,與具體網(wǎng)絡(luò)結(jié)構(gòu)有關(guān),下同)比較,結(jié)果如表1。

表1 CWGCNet與2種典型CNN網(wǎng)絡(luò)分類性能比較

由表1可知,CWGCNet在ImageNet1000上的top-1錯誤率比AlexNet和VGG-16分別降低30.4%和3.3%,且權(quán)重參數(shù)數(shù)量只有后兩者的4.5%和6.5%,說明其具有更強的圖像特征提取能力,卷積核參數(shù)更加高效,參數(shù)冗余更少。將1幅番茄植株圖像輸入CWGCNet,并對其前2層卷積模塊輸出的部分特征圖進行可視化(圖6)。

從圖6可以看出,CWGCNet的前2層卷積模塊輸出了表觀各異的特征圖,說明其學(xué)習(xí)出了能夠提取圖像多種類型特征的卷積核,其中第1層特征圖主要體現(xiàn)了圖像的顏色特征,第2層特征圖主要突出了圖像的邊緣和紋理特征。因此,在式(3)中包含的2幅圖像的光度、對比度、結(jié)構(gòu)和光度差的比較基礎(chǔ)上,采用卷積特征圖構(gòu)建的式(4)能夠進一步為深度估計網(wǎng)絡(luò)的訓(xùn)練引入多樣化的圖像特征比較。

圖6 輸入圖像及對應(yīng)的部分卷積特征圖

3.3 深度估計網(wǎng)絡(luò)的訓(xùn)練

網(wǎng)絡(luò)的訓(xùn)練方法同文獻[16],用Adam(adaptive moment,自適應(yīng)矩)優(yōu)化器對深度估計網(wǎng)絡(luò)進行訓(xùn)練,其中Adam的1階矩指數(shù)衰減率1=0.9、2階矩指數(shù)衰減率2=0.999,每一小批樣本量為8,初始學(xué)習(xí)率為10-3,經(jīng)過10代迭代訓(xùn)練后調(diào)整為10-4,此后每經(jīng)過20代迭代,學(xué)習(xí)率下降10倍。經(jīng)過60代迭代訓(xùn)練,網(wǎng)絡(luò)損失收斂到穩(wěn)定值。

3.4 深度估計網(wǎng)絡(luò)的測試與分析

表2 卷積模塊層數(shù)對視差估計精度的影響

注:數(shù)據(jù)為平均值±標準誤。同列不同小寫字母表示各處理在5%水平上差異顯著。下同。

Note: Data is mean±SE. Values followed by a different letter within a column for treatments are significantly different at 0.05level. Same as below.

表3 深度估計方法性能比較

注:組合A表示模型由本文網(wǎng)絡(luò)結(jié)構(gòu)和文獻[16]的損失函數(shù)構(gòu)成;組合B表示模型由文獻[16]的網(wǎng)絡(luò)結(jié)構(gòu)和本文損失函數(shù)構(gòu)成。

Note: Combination A indicates that the model consists of our network structure and the loss function in [16]; Combination B indicates that the model is composed of the network structure of [16] and the loss function of this paper.

由表3可知,與Godard等人的方法相比,本文方法在FSIM等3種圖像相似度指標上都顯著高于前者,棋盤格角點估計距離誤差也顯著低于前者,RMSE降低了33.1%,MAE降低了35.6%,2顯著提高,說明本文方法估計的視差圖具有更高的精度。以CWGC-CAE為主體結(jié)構(gòu)的深度估計網(wǎng)絡(luò)的計算速度達28.0幀/s,與Godard等人的網(wǎng)絡(luò)相比提高了52.2%。對比Godard等人的方法和組合A表明,CWGC-CAE無論在圖像相似度指標,還是在棋盤格角點估計距離的精度上,其性能都顯著高于前者的網(wǎng)絡(luò)結(jié)構(gòu),且權(quán)重參數(shù)數(shù)量只有前者的16.9%,表明CWGC-CAE在番茄植株圖像深度估計上更具有優(yōu)勢。對比Godard等人的方法和組合B表明,組合B在FSIM、IW-SSIM指標上有顯著提高,棋盤格角點估計距離RMSE和MAE分別降低32.1%和33.3%,說明本文引入的圖像卷積特征近似性損失函數(shù)對提高植株圖像深度估計的精度是有顯著作用的。

同樣用最終深度估計模型,來估計番茄植株雙目圖像測試集中的部分圖像的視差圖,效果如圖7。同時用該模型在含棋盤格標定板的植株雙目圖像數(shù)據(jù)集上測試光照條件和采樣距離對棋盤格角點距離估計精度的影響,結(jié)果如表4。

由表4可知,光照對棋盤格角點估計距離誤差無明顯影響,說明本文的植株圖像深度估計模型對光照變化具有一定的魯棒性。棋盤格標定板的采樣距離對角點間相互距離的估計精度具有顯著影響,誤差隨著采樣距離的增加而增大,當采樣距離為0.5~3.0 m時,RMSE為6.49 mm,MAE為4.36 mm,分別小于0.7和0.5 cm;當距離為6.0~9.0 m時,RMSE為24.63 mm,MAE為17.90 mm,分別小于2.5和1.8 cm。

圖7 深度估計效果

表4 光照條件與采樣距離對棋盤格角點估計距離精度的影響

注:同行不同小寫字母表示各處理在5%水平上差異顯著。

Note: Values followed by a different letter within a row for treatments are significantly different at 0.05 level.

4 結(jié) 論

本文提出一種基于自監(jiān)督學(xué)習(xí)的番茄植株圖像深度估計網(wǎng)絡(luò)模型,構(gòu)建了卷積自編碼器作為模型的主體結(jié)構(gòu),提出引入圖像卷積特征近似性損失作為損失函數(shù)的一部分,以圖像相似度、棋盤格角點估計距離誤差等為判據(jù),用番茄植株雙目圖像對模型進行訓(xùn)練和測試,結(jié)果表明:1)基于面向通道分組卷積模塊設(shè)計的分類網(wǎng)絡(luò)的淺層卷積能夠提取番茄植株的低層圖像特征,與未采用圖像卷積特征近似性損失函數(shù)的模型相比,采用該函數(shù)的模型的棋盤格角點估計距離均方根誤差RMSE和平均絕對誤差MAE分別降低了32.1%和33.3%,該函數(shù)對提高番茄植株圖像深度估計的精度具有顯著作用,且精度隨著參與近似性損失計算的卷積模塊層數(shù)的增加而升高,但超過4層后,進一步增加層數(shù)對精度的提升作用不再明顯。2)圖像采樣距離影響深度估計的精度,當采樣距離在9.0 m以內(nèi)時,所估計的棋盤格角點距離RMSE和MAE分別小于2.5 和1.8 cm,當采樣距離在3.0 m以內(nèi)時,則分別小于0.7 和0.5 cm,模型計算速度為28.0幀/s。3)與已有研究結(jié)果相比,本文模型的RMSE和MAE分別降低33.1%和35.6%,計算速度提高52.2%,深度估計精度和計算速度均顯著提高。

[1]項榮,應(yīng)義斌,蔣煥煜. 田間環(huán)境下果蔬采摘快速識別與定位方法研究進展[J]. 農(nóng)業(yè)機械學(xué)報,2013,44(11):208-223. Xiang Rong, Ying Yibin, Jiang Huanyu. Development of real-time recognition and localization methods for fruits and vegetables in field[J]. Transactions of the Chinese Society for Agricultural Machinery, 2013, 44(11): 208-223. (in Chinese with English abstract)

[2]肖珂,高冠東,馬躍進. 基于Kinect視頻技術(shù)的葡萄園農(nóng)藥噴施路徑規(guī)劃算法[J]. 農(nóng)業(yè)工程學(xué)報,2017,33(24):192-199. Xiao Ke, Gao Guandong, Ma Yuejin. Pesticide spraying route planning algorithm for grapery based on Kinect video technique[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(24): 192-199. (in Chinese with English abstract)

[3]何勇,蔣浩,方慧,等. 車輛智能障礙物檢測方法及其農(nóng)業(yè)應(yīng)用研究進展[J]. 農(nóng)業(yè)工程學(xué)報,2018,34(9):21-32. He Yong, Jiang Hao, Fang Hui, et al. Research progress of intelligent obstacle detection methods of vehicles and their application on agriculture[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(9): 21-32. (in Chinese with English abstract)

[4]莫宇達,鄒湘軍,葉敏,等. 基于Sylvester方程變形的荔枝采摘機器人手眼標定方法[J]. 農(nóng)業(yè)工程學(xué)報,2017,33(4):47-54. Mo Yuda, Zou Xiangjun, Ye Min, et al. Hand-eye calibration method based on Sylvester equation deformation for lychee harvesting robot[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(4): 47-54. (in Chinese with English abstract)

[5]翟長遠,趙春江,Ning Wang,等. 果園風送噴霧精準控制方法研究進展[J]. 農(nóng)業(yè)工程學(xué)報,2018,34(10):1-15. Zhai Changyuan, Zhao Chunjiang, Ning Wang, et al. Research progress on precision control methods of air-assisted spraying in orchards[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(10): 1-15. (in Chinese with English abstract)

[6]翟志強,杜岳峰,朱忠祥,等. 基于Rank變換的農(nóng)田場景三維重建方法[J]. 農(nóng)業(yè)工程學(xué)報,2015,31(20):157-164. Zhai Zhiqiang, Du Yuefeng, Zhu Zhongxiang, et al. Three-dimensional reconstruction method of farmland scene based on Rank transformation[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2015, 31(20): 157-164. (in Chinese with English abstract)

[7]朱镕杰,朱穎匯,王玲,等. 基于尺度不變特征轉(zhuǎn)換算法的棉花雙目視覺定位技術(shù)[J]. 農(nóng)業(yè)工程學(xué)報,2016,32(6):182-188. Zhu Rongjie, Zhu Yinghui, Wang Ling, et al. Cotton positioning technique based on binocular vision with implementation of scale-invariant feature transform algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2016, 32(6): 182-188. (in Chinese with English abstract)

[8]H?mmerle M, H?fle B. Effects of reduced terrestrial LiDAR point density on high-resolution grain crop surface models in precision agriculture[J]. Sensors, 2014, 14(12): 24212-24230.

[9]程曼,蔡振江,Ning Wang,等. 基于地面激光雷達的田間花生冠層高度測量系統(tǒng)研制[J]. 農(nóng)業(yè)工程學(xué)報,2019,35(1):180-187. Cheng Man, Cai Zhenjiang, Ning Wang, et al. System design for peanut canopy height information acquisition based on LiDAR[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(1): 180-187. (in Chinese with English abstract)

[10]Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149.

[11]Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//European Conference on Computer Vision. Springer, Cham, 2018: 833-851.

[12]Liu F, Shen C, Lin G, et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2016, 38(10): 2024-2039.

[13]Mayer N, Ilg E, H?usser P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]//Computer Vision and Pattern Recognition. IEEE, 2016: 4040-4048.

[14]Garg R, Vijay K B G, Carneiro G, et al. Unsupervised CNN for single view depth estimation: geometry to the rescue[C]// European Conference on Computer Vision. Springer, Cham, 2016: 740-756.

[15]Kundu J N, Uppala P K, Pahuja A, et al. AdaDepth: Unsupervised content congruent adaptation for depth estimation[EB/OL]. [2018-06-07] https: //arxiv. org/pdf/1803. 01599. pdf.

[16]Godard C, Aodha O M, Brostow G J. Unsupervised monocular depth estimation with left-right consistency[C]// Computer Vision and Pattern Recognition. IEEE, 2017: 6602-6611.

[17]Geiger A, Lenz P, Stiller C, et al. Vision meets robotics: The KITTI dataset[J]. International Journal of Robotics Research, 2013, 32(11): 1231-1237.

[18]Poggi M, Tosi F, Mattoccia S. Learning monocular depth estimation with unsupervised trinocular assumptions[C]// International Conference on 3D Vision (3DV). IEEE, 2018: 324-333.

[19]Zhang Z. A flexible new technique for camera calibration[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(11): 1330-1334.

[20]Bouguet J Y, Perona P. Closed-form camera calibration in dual-space geometry[C]//European Conference on Computer Vision, 1998.

[21]Zhang T, Qi G J, Xiao B, et al. Interleaved group convolutions for deep neural networks[C]//International Conference on Computer Vision (ICCV), 2017.

[22]Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, Inception-ResNet and the impact of residual connections on learning[C]//Proceedings of Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), 2017.

[23]周云成,許童羽,鄧寒冰,等. 基于面向通道分組卷積網(wǎng)絡(luò)的番茄主要器官實時識別方法[J]. 農(nóng)業(yè)工程學(xué)報,2018,34(10):153-162. Zhou Yuncheng, Xu Tongyu, Deng Hanbing, et al. Real-time recognition of main organs in tomato based on channel wise group convolutional network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(10): 153-162. (in Chinese with English abstract)

[24]Shah A, Kadam E, Shah H, et al. Deep residual networks with exponential linear unit[C]//International Symposium on Computer Vision and the Internet. ACM, 2016: 59-65.

[25]Zeiler M D, Fergus R. Visualizing and understanding convolutional networks[C]//Computer Vision and Pattern Recognition. IEEE, 2014.

[26]Heise P, Klose S, Jensen B, et al. PM-Huber: PatchMatch with huber regularization for stereo matching[C]//IEEE International Conference on Computer Vision. IEEE, 2014: 2360-2367.

[27]Zhang L, Zhang L, Mou X, et al. FSIM: A feature similarity index for image quality assessment[J]. IEEE Transactions on Image Processing, 2011, 20(8): 2378-2386.

[28]Wang Z, Li Q. Information content weighting for perceptual image quality assessment[J]. IEEE Transactions on Image Processing, 2011, 20(5): 1185-1198.

[29]Liu A, Lin W, Narwaria M. Image quality assessment based on gradient similarity[J]. IEEE Transactions on Image Processing, 2012, 21(4): 1500-1512.

[30]Agarwal A, Akchurin E, Basoglu C, et al. The Microsoft cognitive toolkit[EB/OL]. [2018-06-01] https: //docs. microsoft. com/en-us/cognitive-toolkit/.

[31]Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database[C]//Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009: 248-255.

[32]Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//In ICLR, 2015.

[33]Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]// International Conference on Neural Information Processing Systems. Curran Associates Inc. 2012: 1097-1105.

[34]Yang Y, Zhong Z, Shen T, et al. Convolutional neural networks with alternately updated clique[C]//Computer Vision and Pattern Recognition. IEEE, 2018.

Method for estimating the image depth of tomato plant based on self-supervised learning

Zhou Yuncheng, Xu Tongyu, Deng Hanbing, Miao Teng, Wu Qiong

(,,110866,)

Depth estimation is critical to 3D reconstruction and object location in intelligent agricultural machinery vision system, and a common method in it is stereo matching. Traditional stereo matching method used low-quality image extracted manually. Because the color and texture in the image of field plant is nonuniform, the artificial features in the image are poorly distinguishable and mismatching could occur as a result. This would compromise the accuracy of the depth of the map. While the supervised learning-based convolution neural network (CNN) is able to estimate the depth of each pixel in plant image directly, it is expensive to annotate the depth data. In this paper, we present a depth estimation model based on the self-supervised learning to phenotype tomato canopy. The tasks of the depth estimation method were to reconstruct the image. The dense disparity maps were estimated indirectly using the rectified stereo pair of images as the network input, from which a bilinear interpolation was used to sample the input images to reconstruct the warping images. We developed three channel wise group convolutional (CWGC) modules, including the dimension invariable convolution module, the down-sampling convolution module and the up-sampling convolution module, and used them to construct the convolutional auto-encoder - a key infrastructure in the depth estimation method. Considering the shortage of manual features for comparing image similarity, we used the loss in image convolutional feature similarity as one objective of the network training. A CWGC-based CNN classification network (CWGCNet) was developed to extract the low-level features automatically. In addition to the loss in image convolutional feature similarity, we also considered the whole training loss, which include the image appearance matching loss, disparity smoothness loss and left-right disparity consistency loss. A stereo pair of images of tomato was sampled using a binocular camera in a greenhouse. After epipolar rectification, the pair of images was constructed for training and testing of the depth estimation model. Using the Microsoft Cognitive Toolkit (CNTK), the CWGCNet and the depth estimation network of the tomato images were calculated using Python. Both training and testing experiments were conducted in a computer with a Tesla K40c GPU (graphics processing unit). The results showed that the shallow convolutional layer of the CWGCNet successfully extracted the low-level multiformity image features to calculate the loss in image convolutional feature similarity. The convolutional auto-encoder developed in this paper was able to significantly improve the disparity map estimated by the depth estimation model. The loss function in image convolutional feature similarity had a remarkable effect on accuracy of the image depth. The accuracy of the disparity map estimated by the model increased with the number of convolution modules for calculating the loss in convolutional feature similarity. When sampled within 9.0 m, the root means square error (RMSE) and the mean absolute error (MAE) of the corner distance estimated by the model were less than 2.5 cm and 1.8 cm, respectively, while when sampled within 3.0m, the associated errors were less than 0.7cm and 0.5cm, respectively. The coefficient of determination (2) of the proposed model was 0.8081, and the test speed was 28 fps (frames per second). Compared with the existing models, the proposed model reduced the RMSE and MAE by 33.1% and 35.6% respectively, while increased calculation speed by 52.2%.

image processing; convolution neural network; algorithms; self-supervised learning; depth estimation; disparity; deep learning; tomato

周云成,許童羽,鄧寒冰,苗 騰,吳 瓊. 基于自監(jiān)督學(xué)習(xí)的番茄植株圖像深度估計方法[J]. 農(nóng)業(yè)工程學(xué)報,2019,35(24):173-182. doi:10.11975/j.issn.1002-6819.2019.24.021 http://www.tcsae.org

Zhou Yuncheng, Xu Tongyu, Deng Hanbing, Miao Teng, Wu Qiong. Method for estimating the image depth of tomato plant based on self-supervised learning[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(24): 173-182. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2019.24.021 http://www.tcsae.org

2018-11-01

2019-12-01

遼寧省自然科學(xué)基金(20180551102);國家自然科學(xué)基金(31601218)

周云成,副教授,博士,主要從事機器學(xué)習(xí)在農(nóng)業(yè)信息處理中的應(yīng)用研究。Email:zhouyc2002@163.com

10.11975/j.issn.1002-6819.2019.24.021

TP183

A

1002-6819(2019)-24-0173-10

猜你喜歡
雙目番茄卷積
番茄炒蛋
秋茬番茄“疑難雜癥”如何挽救
基于3D-Winograd的快速卷積算法設(shè)計及FPGA實現(xiàn)
番茄果實“起棱”怎么辦
基于雙目測距的卡爾曼濾波船舶軌跡跟蹤
電子制作(2019年20期)2019-12-04 03:51:38
從濾波器理解卷積
電子制作(2019年11期)2019-07-04 00:34:38
基于傅里葉域卷積表示的目標跟蹤算法
基于雙目視覺圖像的長度測量方法
一種基于卷積神經(jīng)網(wǎng)絡(luò)的性別識別方法
基于雙目視覺的接觸線幾何參數(shù)測量方法
機械與電子(2014年2期)2014-02-28 02:07:46
宜兰市| 蛟河市| 桂平市| 贺兰县| 邳州市| 建平县| 板桥市| 秭归县| 临夏市| 樟树市| 洛扎县| 连南| 胶南市| 乃东县| 台南县| 岳西县| 隆回县| 弥渡县| 青神县| 南丹县| 桂林市| 佛坪县| 陇南市| 漯河市| 新蔡县| 蒙山县| 社旗县| 桐乡市| 东乌珠穆沁旗| 新乡县| 伊通| 龙岩市| 余干县| 伊宁县| 仙居县| 绥阳县| 延寿县| 商南县| 探索| 呼伦贝尔市| 谷城县|