胡 潔,張亞莉,王 團(tuán),望夢(mèng)成,蘭玉彬,張植勛
·農(nóng)業(yè)航空工程·
基于深度強(qiáng)化學(xué)習(xí)的農(nóng)田節(jié)點(diǎn)數(shù)據(jù)無(wú)人機(jī)采集方法
胡 潔1,3,張亞莉2,3※,王 團(tuán)1,望夢(mèng)成1,蘭玉彬1,3,張植勛2
(1. 華南農(nóng)業(yè)大學(xué)電子工程學(xué)院,廣州 510642;2. 華南農(nóng)業(yè)大學(xué)工程學(xué)院,廣州 510642;3. 國(guó)家精準(zhǔn)農(nóng)業(yè)航空施藥技術(shù)國(guó)際聯(lián)合研究中心,廣州 510642)
利用無(wú)人機(jī)采集農(nóng)田傳感器節(jié)點(diǎn)數(shù)據(jù),可避免網(wǎng)絡(luò)節(jié)點(diǎn)間多次轉(zhuǎn)發(fā)數(shù)據(jù)造成節(jié)點(diǎn)電量耗盡,近網(wǎng)關(guān)節(jié)點(diǎn)過(guò)早死亡及網(wǎng)絡(luò)生命周期縮短等問(wèn)題。由于相鄰傳感器數(shù)據(jù)可能存在冗余、無(wú)人機(jī)可同時(shí)覆蓋多個(gè)節(jié)點(diǎn)進(jìn)行采集等特點(diǎn),該研究針對(duì)冗余覆蓋下部分節(jié)點(diǎn)數(shù)據(jù)采集和全節(jié)點(diǎn)數(shù)據(jù)采集,對(duì)無(wú)人機(jī)數(shù)據(jù)采集的路線及方案進(jìn)行優(yōu)化,以減輕無(wú)人機(jī)能耗,縮短任務(wù)完成時(shí)間。在冗余覆蓋下部分節(jié)點(diǎn)數(shù)據(jù)采集場(chǎng)景中,通過(guò)競(jìng)爭(zhēng)雙重深度Q網(wǎng)絡(luò)算法(Dueling Double Deep Q Network,DDDQN)優(yōu)化無(wú)人機(jī)節(jié)點(diǎn)選擇及采集順序,使采集的數(shù)據(jù)滿(mǎn)足覆蓋率要求的同時(shí)無(wú)人機(jī)能效最優(yōu)。仿真結(jié)果表明,該算法在滿(mǎn)足相同感知覆蓋率要求下,較深度Q網(wǎng)絡(luò)(Deep Q Network,DQN)算法的飛行距離縮短了1.21 km,能耗減少27.9%。在全節(jié)點(diǎn)數(shù)據(jù)采集場(chǎng)景中,采用兩級(jí)深度強(qiáng)化學(xué)習(xí)聯(lián)合(Double Deep Reinforcement Learning,DDRL)方法對(duì)無(wú)人機(jī)的懸停位置和順序進(jìn)行優(yōu)化,使無(wú)人機(jī)完成數(shù)據(jù)采集任務(wù)時(shí)的總能耗最小。仿真結(jié)果表明,單節(jié)點(diǎn)數(shù)據(jù)量在160 kB以下時(shí),在不同節(jié)點(diǎn)個(gè)數(shù)及無(wú)人機(jī)飛行速度下,該方法比經(jīng)典基于粒子群優(yōu)化的旅行商問(wèn)題(Particle Swarm Optimization-Traveling Salesman Problem,PSO-TSP)算法和最小化能量飛行控制(Minimized Energy Flight Control,MEFC)算法的總能耗最少節(jié)約6.3%。田間試驗(yàn)結(jié)果表明,相比PSO-TSP算法,基于DDRL的數(shù)據(jù)采集方法的無(wú)人機(jī)總能耗降低11.5%。研究結(jié)構(gòu)可為無(wú)人機(jī)大田無(wú)線傳感器節(jié)點(diǎn)數(shù)據(jù)采集提供參考。
無(wú)人機(jī);數(shù)據(jù)采集;深度強(qiáng)化學(xué)習(xí);節(jié)點(diǎn)感知冗余;DQN;DRL
各種農(nóng)業(yè)傳感器在農(nóng)業(yè)生產(chǎn)中起著監(jiān)測(cè)作物生長(zhǎng)環(huán)境、協(xié)助精準(zhǔn)灌溉和施肥及病蟲(chóng)害防治等作用[1-2]。在網(wǎng)絡(luò)基礎(chǔ)設(shè)施缺乏的部分邊遠(yuǎn)地區(qū),農(nóng)田無(wú)線傳感器節(jié)點(diǎn)的數(shù)據(jù)采集存在困難[3-4]。因此,利用各種移動(dòng)設(shè)備采集田間無(wú)線傳感器節(jié)點(diǎn)數(shù)據(jù)成為一種解決措施[5-6]。相比地面移動(dòng)設(shè)備,無(wú)人機(jī)具有不受限于地面環(huán)境、不破壞地面作物、信號(hào)傳輸所受阻擋小等優(yōu)勢(shì),是采集傳感器節(jié)點(diǎn)數(shù)據(jù)的有利途徑[7]。目前農(nóng)業(yè)無(wú)人機(jī)普遍采用牛耕法的方式進(jìn)行噴藥、撒播等作業(yè),然而當(dāng)采集隨機(jī)布置的傳感器節(jié)點(diǎn)數(shù)據(jù)時(shí),相鄰節(jié)點(diǎn)間的數(shù)據(jù)可能存在冗余[8-9],且無(wú)人機(jī)在一個(gè)懸停點(diǎn)可能覆蓋多個(gè)節(jié)點(diǎn)[10],因此需要對(duì)無(wú)人機(jī)數(shù)據(jù)采集的路線及方案進(jìn)行優(yōu)化,以減小無(wú)人機(jī)能耗、縮短任務(wù)完成時(shí)間。
針對(duì)無(wú)人機(jī)采集傳感器節(jié)點(diǎn)數(shù)據(jù)的方法,國(guó)內(nèi)外學(xué)者開(kāi)展了相關(guān)研究,大部分優(yōu)化方案主要集中在降低能耗[11-12]、任務(wù)完成時(shí)間最短[13]、軌跡距離最小[14-17]等。Luo等[11]提出了一種智慧農(nóng)場(chǎng)的數(shù)據(jù)采集方案,根據(jù)傳感器節(jié)點(diǎn)的接收信號(hào)強(qiáng)度(Received Signal Strength Indication,RSSI)確定簇和簇頭,采用改進(jìn)的Dijkstra和遺傳算法(Genetic algorithm,GA)尋求最佳軌跡。Ben等[12]提出了一種使用無(wú)人機(jī)從無(wú)線傳感器網(wǎng)絡(luò)節(jié)點(diǎn)采集數(shù)據(jù)的解決方案,可以同時(shí)減小通信與無(wú)人機(jī)飛行能耗。Just等[14]針對(duì)無(wú)人機(jī)采集大面積節(jié)點(diǎn)數(shù)據(jù),使用時(shí)隙概念結(jié)合飛行禁止列表,將無(wú)人機(jī)路徑與每個(gè)節(jié)點(diǎn)的激活周期進(jìn)行同步,大大縮短了飛行距離和飛行時(shí)間。Zhang等[15]提出了一種基于分層深度強(qiáng)化學(xué)習(xí)(Hierarchical Deep Reinforcement Learning,HDRL)算法解決可充電多無(wú)人機(jī)數(shù)據(jù)采集場(chǎng)景的路徑規(guī)劃問(wèn)題,最大限度地縮短無(wú)人機(jī)的總飛行時(shí)間。蔣寶慶等[16]提出了一種基于Q學(xué)習(xí)的無(wú)人機(jī)輔助采集小規(guī)模無(wú)線傳感器節(jié)點(diǎn)數(shù)據(jù),減少了無(wú)人機(jī)的任務(wù)完成時(shí)間和有效數(shù)據(jù)量,提高了無(wú)人機(jī)能效。Yi等[17]研究了一種基于深度強(qiáng)化學(xué)習(xí)(Deep Reinforcement Learning,DRL)的無(wú)人機(jī)輔助物聯(lián)網(wǎng)采集最優(yōu)信息年齡(Age of Information,AoI)數(shù)據(jù),得到最優(yōu)的無(wú)人機(jī)飛行軌跡和傳感器節(jié)點(diǎn)的傳輸調(diào)度方案。文獻(xiàn)[15-18]采用強(qiáng)化學(xué)習(xí)算法研究無(wú)人機(jī)數(shù)據(jù)采集問(wèn)題,無(wú)人機(jī)通過(guò)與環(huán)境進(jìn)行交互得到反饋,通過(guò)自主學(xué)習(xí)獲得最優(yōu)數(shù)據(jù)采集策略。上述研究大多針對(duì)無(wú)人機(jī)的飛行距離進(jìn)行優(yōu)化,忽略了節(jié)點(diǎn)數(shù)據(jù)量大小和通信范圍等因素。
作者在無(wú)人機(jī)果樹(shù)噴藥的研究中發(fā)現(xiàn),懸停能耗對(duì)無(wú)人機(jī)整體能耗的影響更大,需綜合考慮懸停時(shí)間和飛行時(shí)間。因此,針對(duì)能量受限的農(nóng)業(yè)無(wú)人機(jī)不同數(shù)據(jù)采集場(chǎng)景,本文基于深度強(qiáng)化學(xué)習(xí)算法,在考慮傳感器節(jié)點(diǎn)冗余覆蓋的情況下,研究如何對(duì)無(wú)人機(jī)進(jìn)行調(diào)度,有選擇地采集部分節(jié)點(diǎn)數(shù)據(jù)及規(guī)劃采集順序,使數(shù)據(jù)采集在滿(mǎn)足覆蓋率要求的同時(shí)達(dá)到無(wú)人機(jī)能效最優(yōu),在需要采集全部傳感器節(jié)點(diǎn)數(shù)據(jù)的情況下優(yōu)化無(wú)人機(jī)的懸停位置和采集順序,使無(wú)人機(jī)能耗最小。
在布設(shè)傳感器時(shí),為了對(duì)整個(gè)作業(yè)區(qū)域進(jìn)行全面覆蓋,傳感器之間往往存在感知范圍交叉重疊的情況,這種情況在隨機(jī)布設(shè)傳感器的環(huán)境中尤其明顯。無(wú)人機(jī)受限于自身能量和續(xù)航時(shí)間,在采集傳感器數(shù)據(jù)時(shí)需要對(duì)傳感器進(jìn)行甄選,對(duì)飛行路徑進(jìn)行優(yōu)化,以求在感知覆蓋率要求和無(wú)人機(jī)能耗之間獲得最佳平衡。
假定農(nóng)田隨機(jī)分布的傳感器節(jié)點(diǎn)總數(shù)為,每個(gè)節(jié)點(diǎn)的感知范圍是以節(jié)點(diǎn)為中心的圓(本文假設(shè)所有節(jié)點(diǎn)的感知半徑相同),如圖1所示。在采集數(shù)據(jù)過(guò)程中無(wú)人機(jī)懸停在節(jié)點(diǎn)的正上方,在一個(gè)懸停位置只采集一個(gè)節(jié)點(diǎn)的數(shù)據(jù),傳感器節(jié)點(diǎn)在收到無(wú)人機(jī)的信標(biāo)信息后被喚醒并向無(wú)人機(jī)發(fā)送數(shù)據(jù),無(wú)人機(jī)采集完該節(jié)點(diǎn)數(shù)據(jù)后飛向下一個(gè)被選擇的節(jié)點(diǎn)。數(shù)據(jù)采集過(guò)程中,假設(shè)無(wú)人機(jī)可以獲知各節(jié)點(diǎn)的位置及當(dāng)前自身能量。無(wú)人機(jī)以固定高度飛行,為簡(jiǎn)便表示,后續(xù)統(tǒng)一采用平面二維坐標(biāo)表示無(wú)人機(jī)的懸停位置。
圖1 冗余覆蓋下部分節(jié)點(diǎn)數(shù)據(jù)采集場(chǎng)景
農(nóng)業(yè)生產(chǎn)中,有些情況下需要采集田間所有傳感器節(jié)點(diǎn)的數(shù)據(jù),若每采集一個(gè)節(jié)點(diǎn)數(shù)據(jù)都要懸停,懸停點(diǎn)過(guò)多,無(wú)人機(jī)會(huì)有巨大的能量消耗。由于無(wú)人機(jī)的通信區(qū)域是以懸停點(diǎn)在地面投影為中心的圓,在這個(gè)范圍內(nèi)的節(jié)點(diǎn)都可以與無(wú)人機(jī)進(jìn)行數(shù)據(jù)傳輸,通過(guò)合理選擇無(wú)人機(jī)的懸停點(diǎn)位置和采集順序,讓無(wú)人機(jī)在每個(gè)懸停點(diǎn)時(shí)采集其通信范圍內(nèi)的傳感器節(jié)點(diǎn)數(shù)據(jù),達(dá)到減少無(wú)人機(jī)懸停次數(shù)及能耗的目標(biāo),最終完成所有傳感器節(jié)點(diǎn)的數(shù)據(jù)采集。
如圖2所示,假設(shè)田間有個(gè)無(wú)線傳感器節(jié)點(diǎn),每個(gè)節(jié)點(diǎn)都有準(zhǔn)確定位(配備GPS或北斗天線),每個(gè)傳感器都具有相同的通信范圍和數(shù)據(jù)緩沖區(qū)大小。無(wú)人機(jī)在飛行高度固定為的情況下,在某個(gè)懸停點(diǎn)可正常通信的是圖2中圓形區(qū)域內(nèi)的傳感器節(jié)點(diǎn),設(shè)該區(qū)域的半徑為(通過(guò)信道傳輸模型計(jì)算得出),區(qū)域中的傳感器節(jié)點(diǎn)以單跳方式與無(wú)人機(jī)進(jìn)行通信。假設(shè)無(wú)人機(jī)勻速飛行,飛行高度固定,通過(guò)對(duì)懸停位置選擇及飛行路徑規(guī)劃,使無(wú)人機(jī)在采集完所有傳感器節(jié)點(diǎn)數(shù)據(jù)時(shí)總能耗最小。為簡(jiǎn)便表示,后續(xù)統(tǒng)一用平面二維坐標(biāo)表示無(wú)人機(jī)的懸停位置。
注:H為無(wú)人機(jī)飛行高度,m;R為無(wú)人機(jī)在某個(gè)懸停點(diǎn)可與傳感器正常通信的區(qū)域半徑,m。
無(wú)人機(jī)懸停采集田間節(jié)點(diǎn)數(shù)據(jù),其懸停時(shí)間與數(shù)據(jù)傳輸速率相關(guān),而數(shù)據(jù)傳輸速率受信道損耗和衰落的影響。
假設(shè)在通信過(guò)程中傳感器節(jié)點(diǎn)的發(fā)射功率為,根據(jù)香農(nóng)定理可得位置的無(wú)人機(jī)與節(jié)點(diǎn)間的數(shù)據(jù)傳輸速率R,i(bit/s)為
從式(2)可以看出,在其他參數(shù)不變的情況下,無(wú)人機(jī)與節(jié)點(diǎn)的距離越遠(yuǎn),數(shù)據(jù)傳輸速率越慢,則無(wú)人機(jī)懸停時(shí)間越長(zhǎng)。
本文采用Zeng等[19]的旋翼無(wú)人機(jī)功率模型:
由式(3)~(4)、式(6)~(7)可知,在冗余覆蓋下部分節(jié)點(diǎn)數(shù)據(jù)采集場(chǎng)景中,無(wú)人機(jī)從SN-1起飛到結(jié)束SN的數(shù)據(jù)采集所消耗的能量(kJ)為
同理,在全節(jié)點(diǎn)數(shù)據(jù)采集場(chǎng)景中,從SN-1起飛到結(jié)束SN的數(shù)據(jù)采集所消耗的能量(kJ)為
最終,無(wú)人機(jī)完成個(gè)懸停位置的數(shù)據(jù)采集任務(wù)所需要的總能耗E(kJ)為
強(qiáng)化學(xué)習(xí)是一類(lèi)特定的機(jī)器學(xué)習(xí)方法,可解決關(guān)于序列決策的相關(guān)問(wèn)題[20]。深度強(qiáng)化學(xué)習(xí)(Deep Reinforcement Learning,DRL)將深度學(xué)習(xí)的感知能力和強(qiáng)化學(xué)習(xí)的決策能力相結(jié)合[21-22],通過(guò)深度學(xué)習(xí)對(duì)環(huán)境狀態(tài)信息進(jìn)行特征提取,傳遞給智能體進(jìn)行決策并執(zhí)行動(dòng)作,執(zhí)行動(dòng)作后得到由環(huán)境反饋的獎(jiǎng)懲信號(hào)以及環(huán)境狀態(tài)的改變,促進(jìn)智能體進(jìn)行下一步的動(dòng)作。智能體通過(guò)與環(huán)境迭代交互選取一系列動(dòng)作以最大化累積獎(jiǎng)勵(lì),即在有限反饋中實(shí)現(xiàn)序列決策的優(yōu)化[23]。田間節(jié)點(diǎn)數(shù)據(jù)采集本質(zhì)上是在多種影響因素變化情況下對(duì)無(wú)人機(jī)懸停位置的選擇和采集時(shí)序的安排;每采集一個(gè)節(jié)點(diǎn)數(shù)據(jù),或會(huì)帶來(lái)相鄰節(jié)點(diǎn)數(shù)據(jù)的效用變化,或會(huì)帶來(lái)去往其余節(jié)點(diǎn)的距離成本變化,這可以轉(zhuǎn)化為序列決策問(wèn)題。本文基于DRL方法,針對(duì)前述2個(gè)場(chǎng)景設(shè)計(jì)了競(jìng)爭(zhēng)雙重深度Q網(wǎng)絡(luò)(Dueling Double Deep Q Network,DDDQN)及兩級(jí)深度強(qiáng)化學(xué)習(xí)(Double Deep Reinforcement Learning,DDRL)算法。
2.2.1 任務(wù)環(huán)境
設(shè)農(nóng)田為一個(gè)矩形區(qū)域,如圖3所示,無(wú)人機(jī)按照虛線所示的路線采集節(jié)點(diǎn)數(shù)據(jù)。采集節(jié)點(diǎn)a數(shù)據(jù)時(shí),無(wú)人機(jī)獲取的數(shù)據(jù)覆蓋范圍增量是以a為中心的整個(gè)圓范圍;采集節(jié)點(diǎn)b數(shù)據(jù)時(shí),數(shù)據(jù)覆蓋范圍增量是以b為中心、去掉陰影區(qū)域1的剩余圓;采集節(jié)點(diǎn)c數(shù)據(jù)時(shí),數(shù)據(jù)覆蓋范圍增量是以c為中心、去掉陰影區(qū)域2的剩余圓;而采集節(jié)點(diǎn)d的數(shù)據(jù)覆蓋率增量只有陰影區(qū)域3的范圍,節(jié)點(diǎn)d與已采集的節(jié)點(diǎn)間數(shù)據(jù)冗余比例高,無(wú)人機(jī)根據(jù)獎(jiǎng)勵(lì)函數(shù)(公式(12))確定當(dāng)前動(dòng)作,可能會(huì)將節(jié)點(diǎn)d舍棄。
注:a~e為傳感器節(jié)點(diǎn);1為節(jié)點(diǎn)a與節(jié)點(diǎn)b的冗余覆蓋部分;2為節(jié)點(diǎn)b與節(jié)點(diǎn)c的冗余覆蓋部分;3為節(jié)點(diǎn)d帶來(lái)的數(shù)據(jù)覆蓋率增量。
2.2.2 DDDQN算法
DDDQN算法將雙重深度Q網(wǎng)絡(luò)(Double DQN)[24]與競(jìng)爭(zhēng)深度Q網(wǎng)絡(luò)(Dueling DQN)[25]相融合。圖4為DDDQN算法框架[26]。該算法結(jié)合了2種算法的優(yōu)勢(shì),將DQN算法存在的網(wǎng)絡(luò)值估計(jì)過(guò)高的問(wèn)題與網(wǎng)絡(luò)的結(jié)構(gòu)同時(shí)進(jìn)行優(yōu)化。
注:s為無(wú)人機(jī)當(dāng)前狀態(tài);V(s)為在狀態(tài)s下通過(guò)神經(jīng)網(wǎng)絡(luò)得到的狀態(tài)值;A(s,a)為在狀態(tài)為s下執(zhí)行動(dòng)作a時(shí)通過(guò)神經(jīng)網(wǎng)絡(luò)得到的優(yōu)勢(shì)值;Q(s,a)為在狀態(tài)值V(s)和優(yōu)勢(shì)值A(chǔ)(s,a)的總和。
在冗余覆蓋場(chǎng)景中,無(wú)人機(jī)強(qiáng)化學(xué)習(xí)模型中的狀態(tài)-動(dòng)作-獎(jiǎng)勵(lì)框架如下:
3)獎(jiǎng)勵(lì)函數(shù):考慮到無(wú)人機(jī)續(xù)航能力有限,需要盡可能快地從分散的傳感器節(jié)點(diǎn)采集數(shù)據(jù),避免花費(fèi)大量時(shí)間和能耗采集冗余覆蓋部分?jǐn)?shù)據(jù)。因此,在建立獎(jiǎng)勵(lì)函數(shù)時(shí)需要兼顧正向和負(fù)向獎(jiǎng)勵(lì),定義如下:
2.3.1 任務(wù)環(huán)境
全節(jié)點(diǎn)數(shù)據(jù)采集與部分節(jié)點(diǎn)數(shù)據(jù)采集的區(qū)別是:(1)要求采集所有節(jié)點(diǎn)的數(shù)據(jù);(2)懸停位置不局限在節(jié)點(diǎn)的正上方;(3)在一個(gè)懸停位置可以采集多個(gè)在通信范圍內(nèi)的節(jié)點(diǎn)數(shù)據(jù)。
全節(jié)點(diǎn)數(shù)據(jù)采集示意圖如圖5所示,設(shè)農(nóng)田為矩形區(qū)域,為定位無(wú)人機(jī)的懸停位置,將農(nóng)田區(qū)域離散為×個(gè)小柵格,每個(gè)小柵格中心是無(wú)人機(jī)可選的懸停點(diǎn)。柵格越小,對(duì)懸停位置的優(yōu)化程度越高,但算法復(fù)雜度越高。圖中三角形代表某個(gè)飛行策略確定的懸停采集點(diǎn),虛線代表飛行路徑。無(wú)人機(jī)在懸停點(diǎn)2采集傳感器b和c的數(shù)據(jù),在懸停點(diǎn)3采集傳感器d和e的數(shù)據(jù)。
針對(duì)全節(jié)點(diǎn)數(shù)據(jù)采集場(chǎng)景,若直接采用DQN算法對(duì)系統(tǒng)進(jìn)行建模,會(huì)導(dǎo)致系統(tǒng)狀態(tài)空間維數(shù)爆炸、模型訓(xùn)練困難,無(wú)人機(jī)甚至難以完成數(shù)據(jù)采集任務(wù)。本文提出一種基于兩級(jí)深度強(qiáng)化學(xué)習(xí)(DDRL)的數(shù)據(jù)采集策略,將該問(wèn)題分解為2個(gè)子問(wèn)題進(jìn)行研究,以簡(jiǎn)化模型,避免出現(xiàn)直接采用DQN算法產(chǎn)生的問(wèn)題。分解后的2個(gè)子問(wèn)題為:1)選擇無(wú)人機(jī)最優(yōu)采集區(qū)域,采用深度Q網(wǎng)絡(luò)(Deep Q Network,DQN)算法;2)在得到的最優(yōu)采集區(qū)域內(nèi),采用n步深度Q網(wǎng)絡(luò)(Option n-step Deep Q Network,OnDQN)算法選擇懸停位置以及確定遍歷順序。
注:n0為矩形區(qū)域長(zhǎng)度;k為小柵格長(zhǎng)度;f為傳感器節(jié)點(diǎn);1~4為無(wú)人機(jī)懸停采集點(diǎn)。
2.3.2 基于DQN的最優(yōu)采集區(qū)域選取
3)獎(jiǎng)勵(lì)函數(shù):獎(jiǎng)勵(lì)函數(shù)由3個(gè)部分組成,其中能量效率表示當(dāng)前動(dòng)作執(zhí)行后采集的傳感器節(jié)點(diǎn)數(shù)增量與當(dāng)前動(dòng)作執(zhí)行所消耗的能量之比,如式(13)所示。
注:1~7是根據(jù)傳感器節(jié)點(diǎn)通信范圍交疊情況劃分的子區(qū)域。
Note:1-7is a sub-region divided by overlapping communication ranges of the sensor nodes.
圖6 數(shù)據(jù)采集子區(qū)域的劃分示意圖
Fig.6 Schematic diagram of data collection subarea division
式中為負(fù)常數(shù)。
另外,為避免無(wú)人機(jī)耗盡電量,在數(shù)據(jù)采集過(guò)程中,若無(wú)人機(jī)出現(xiàn)電量不足的情況也給出懲罰:
式中為負(fù)常數(shù)。
2.3.3 基于OnDQN的最優(yōu)航跡規(guī)劃
無(wú)人機(jī)懸停點(diǎn)距離傳感器位置越遠(yuǎn),數(shù)據(jù)傳輸速率越低,懸停時(shí)間就越長(zhǎng)。選定無(wú)人機(jī)的最優(yōu)采集子區(qū)域后,需對(duì)子區(qū)域的采集順序和懸停位置進(jìn)行決策,以平衡無(wú)人機(jī)的飛行距離與懸停時(shí)間,從而最大限度地減少無(wú)人機(jī)能耗。在該問(wèn)題中,無(wú)人機(jī)動(dòng)作空間包括飛向某一子區(qū)域、懸停在子區(qū)域某個(gè)位置、數(shù)據(jù)采集3個(gè)動(dòng)作,本文采用基于選項(xiàng)(option)的分層強(qiáng)化學(xué)習(xí)思想來(lái)解決該問(wèn)題[15,27],將狀態(tài)空間、動(dòng)作空間、獎(jiǎng)勵(lì)函數(shù)以及option建模如下:
當(dāng)獎(jiǎng)勵(lì)函數(shù)為稀疏獎(jiǎng)勵(lì)時(shí),可能會(huì)導(dǎo)致算法學(xué)習(xí)效率低,甚至難以收斂[28]。本文采用n-step回報(bào)代替1-step回報(bào)來(lái)加快算法迭代速度[29]。定義n-step回報(bào)函數(shù)為
需要說(shuō)明的是,本文的研究基于一些假設(shè)前提:假定無(wú)人機(jī)勻速飛行,沒(méi)有考慮從懸停到起飛以及飛行到懸停的速度變化過(guò)程的能量消耗變化;假定無(wú)人機(jī)從一個(gè)位置到另一個(gè)位置是直線飛行,實(shí)際應(yīng)用中還需考慮無(wú)人機(jī)的轉(zhuǎn)向問(wèn)題。
為驗(yàn)證節(jié)點(diǎn)數(shù)據(jù)采集算法的有效性,分別在冗余覆蓋下部分節(jié)點(diǎn)數(shù)據(jù)采集場(chǎng)景和全節(jié)點(diǎn)數(shù)據(jù)采集場(chǎng)景進(jìn)行仿真試驗(yàn),試驗(yàn)在windows10系統(tǒng)下進(jìn)行,處理器為AMD Ryzen 5 2500U,頻率為2.0 GHz。網(wǎng)絡(luò)架構(gòu)使用谷歌的開(kāi)源Tensorflow模塊構(gòu)建,利用Python搭建深度強(qiáng)化學(xué)習(xí)仿真環(huán)境。表1為旋翼無(wú)人機(jī)的功率模型參數(shù)。
表1 旋翼無(wú)人機(jī)參數(shù)
仿真假設(shè)在640 m×640 m的農(nóng)田隨機(jī)均勻部署20個(gè)無(wú)線傳感器節(jié)點(diǎn),每個(gè)節(jié)點(diǎn)的感知覆蓋半徑均為80 m,無(wú)人機(jī)飛行速度為5 m/s,飛行高度為10 m,每個(gè)傳感器存儲(chǔ)的數(shù)據(jù)為160 kB。經(jīng)平衡運(yùn)算復(fù)雜度和優(yōu)化結(jié)果,將農(nóng)田離散為40 m×40 m的網(wǎng)格,共計(jì)256個(gè)網(wǎng)格單元。以覆蓋率和有效覆蓋平均能耗評(píng)估算法性能,定義如下:
根據(jù)式(12)設(shè)置獎(jiǎng)勵(lì)函數(shù),在實(shí)際訓(xùn)練過(guò)程中發(fā)現(xiàn),負(fù)向獎(jiǎng)勵(lì)的取值范圍遠(yuǎn)大于正向獎(jiǎng)勵(lì),導(dǎo)致正向激勵(lì)幾乎不起作用。為此使用對(duì)數(shù)函數(shù)變換對(duì)原始獎(jiǎng)勵(lì)計(jì)算結(jié)果進(jìn)行歸一化處理:
圖8為以最佳配置4作為獎(jiǎng)勵(lì)函數(shù)調(diào)節(jié)因子時(shí),DDDQN算法與DQN算法的學(xué)習(xí)差異。DDDQN的學(xué)習(xí)過(guò)程比DQN更穩(wěn)定,在學(xué)習(xí)結(jié)束時(shí)獲得更高的周期獎(jiǎng)勵(lì)。在調(diào)節(jié)因子配置4下DDDQN算法每個(gè)episode可獲得12的累積獎(jiǎng)勵(lì),而DQN最多只能獲得9的累積獎(jiǎng)勵(lì)。
表2 調(diào)節(jié)因子配置
注:,為獎(jiǎng)勵(lì)函數(shù)的調(diào)節(jié)因子,2為調(diào)節(jié)因子配置組編號(hào)。
圖7 不同調(diào)節(jié)因子配置下的覆蓋率與有效覆蓋平均能耗
圖8 DDQN和DQN算法的訓(xùn)練獎(jiǎng)勵(lì)值對(duì)比
圖9為采集節(jié)點(diǎn)覆蓋率超過(guò)80%時(shí)DDDQN和DQN算法的無(wú)人機(jī)數(shù)據(jù)采集方案。經(jīng)計(jì)算,DDDQN算法的飛行距離為3.13 km,相比DQN算法的4.34 km,無(wú)人機(jī)能耗減少了27.9%,飛行距離縮短1.21 km。根據(jù)式(19)~(20)計(jì)算可得,DDDQN相比DQN的有效覆蓋平均能耗降低了26.3%。
圖9 DDDQN和DQN算法的數(shù)據(jù)采集方案
在仿真中,假設(shè)無(wú)線傳感器節(jié)點(diǎn)隨機(jī)分布在600 m×600 m的矩形區(qū)域內(nèi),將該矩形區(qū)域劃分為15 m×15 m的網(wǎng)格。設(shè)置矩形區(qū)域左下角為無(wú)人機(jī)的飛行起點(diǎn),飛行高度為10 m;無(wú)人機(jī)與傳感器節(jié)點(diǎn)間單位距離(1 m)的信噪比=34 dB;通信帶寬=10 kHz。
圖10是不同條件下本文DDRL算法與經(jīng)典PSO-TSP算法的結(jié)果對(duì)比。PSO-TSP算法是在Chen[30]等提出的IGA方法的基礎(chǔ)上提出來(lái)的。PSO-TSP算法要求無(wú)人機(jī)遍歷每個(gè)節(jié)點(diǎn),在每個(gè)節(jié)點(diǎn)正上方懸停采集數(shù)據(jù)。
圖10a是無(wú)人機(jī)的飛行速度對(duì)總能耗和總工作時(shí)間的影響。在仿真中設(shè)置20個(gè)傳感器,每個(gè)傳感器儲(chǔ)存的數(shù)據(jù)為160 kB,通信半徑為80 m。對(duì)于2種算法來(lái)說(shuō),飛行速度的增加都會(huì)減小無(wú)人機(jī)的總能耗和工作時(shí)間。在相同速度下,DDRL算法比PSO-TSP算法的能耗更低,當(dāng)速度較低時(shí)DDRL算法優(yōu)勢(shì)更明顯,無(wú)人機(jī)飛行速度為5 m/s時(shí),相比PSO-TSP,采用DDRL算法的無(wú)人機(jī)總能耗減少7.8%,工作時(shí)間減少9.2%。
圖10b為節(jié)點(diǎn)數(shù)據(jù)負(fù)載量對(duì)無(wú)人機(jī)能耗的影響。仿真中傳感器節(jié)點(diǎn)數(shù)設(shè)置為20個(gè),傳感器的通信半徑分別設(shè)為60 和80 m。從圖10b中可以看出,本文DDRL算法在節(jié)點(diǎn)數(shù)據(jù)量較少(少于160 kB)的情況下比PSO-TSP算法的能耗低;通信半徑60 m時(shí)DDRL算法的優(yōu)勢(shì)較80 m更明顯。節(jié)點(diǎn)數(shù)據(jù)量變大后,相比PSO-TSP算法在節(jié)點(diǎn)正上方采集數(shù)據(jù)(距離近采集時(shí)間短),DDRL算法在每個(gè)懸停點(diǎn)采集多個(gè)節(jié)點(diǎn)數(shù)據(jù),由于有些節(jié)點(diǎn)的距離遠(yuǎn)導(dǎo)致采集時(shí)間延長(zhǎng),導(dǎo)致懸停時(shí)間和能耗加大,總能耗超過(guò)PSO-TSP算法,通信范圍大時(shí)更加明顯。隨著節(jié)點(diǎn)數(shù)據(jù)量增加,無(wú)人機(jī)飛行能耗占比減少,這是因?yàn)闊o(wú)人機(jī)懸停能耗增加。同等數(shù)據(jù)量前提下,DDRL算法的飛行能耗占比比PSO-TSP算法低,節(jié)點(diǎn)通信距離80 m時(shí)的飛行能耗占比比60 m時(shí)低,即懸停能耗占比高。
圖10c是飛行速度對(duì)無(wú)人機(jī)懸停采集時(shí)間的影響。設(shè)置節(jié)點(diǎn)的水平通信距離為80 m,節(jié)點(diǎn)數(shù)據(jù)量為160 kB。由于PSO-TSP算法中無(wú)人機(jī)懸停在每個(gè)傳感器正上方采集數(shù)據(jù),數(shù)據(jù)傳輸距離近,故其懸停采集時(shí)間最低且不受飛行速度影響。而在DDRL算法中,懸停采集時(shí)間隨著飛行速度的增加而減少,這是因?yàn)镈DRL算法通過(guò)懸停點(diǎn)選擇和采集順序決策來(lái)優(yōu)化總能耗,飛行速度增加,飛行能耗變小,此時(shí)懸停能耗成為影響總能耗的主要因素。
圖10d為水平通信距離60 m、節(jié)點(diǎn)數(shù)據(jù)量160 kB時(shí)傳感器節(jié)點(diǎn)個(gè)數(shù)對(duì)總能耗的影響。將本文DDRL算法與PSO-TSP及MEFC(Minimized Energy Flight Control)算法[31對(duì)比可知,MEFC算法考慮了無(wú)人機(jī)的飛行速度與轉(zhuǎn)彎角度對(duì)能耗的影響,找到最優(yōu)飛行速度并優(yōu)化了飛行軌跡,將懸停點(diǎn)規(guī)劃在傳感器傳輸范圍邊緣,使無(wú)人機(jī)以低能耗完成數(shù)據(jù)采集。從圖10d中可以看出,PSO-TSP的無(wú)人機(jī)總能耗最大,這是因?yàn)闊o(wú)人機(jī)需要飛到每個(gè)傳感器節(jié)點(diǎn)正上方采集數(shù)據(jù),由于飛行路徑增加使其飛行能耗大大增加。MEFC算法規(guī)劃的懸停點(diǎn)均在傳感器傳輸范圍邊緣,且未考慮傳輸范圍重疊的情況,雖然減少了飛行距離和飛行能耗,但增加了懸停點(diǎn)的數(shù)據(jù)傳輸時(shí)間,即增加了懸停時(shí)間和懸停能耗。隨著節(jié)點(diǎn)數(shù)量的增加,3種算法的總能耗都增加。因此在數(shù)據(jù)量不大的情況下,本文DDRL算法的總能耗較另兩種算法更具優(yōu)勢(shì)。
DDRL: Double Deep Reinforcement Learning; PSO-TSP: Particle Swarm Optimization-Traveling Salesman Problem; MEFC: Minimized Energy Flight Control; 60,80: Horizontal communication distance, m.
針對(duì)全節(jié)點(diǎn)數(shù)據(jù)采集場(chǎng)景,本研究通過(guò)田間試驗(yàn)對(duì)所提方法進(jìn)行評(píng)估并驗(yàn)證其可行性。試驗(yàn)在廣州華南農(nóng)業(yè)大學(xué)增城教研基地開(kāi)展,采用自制的四旋翼無(wú)人機(jī)在210 m×400 m的平坦農(nóng)田中進(jìn)行測(cè)試,無(wú)人機(jī)飛行高度為5 m。為了驗(yàn)證無(wú)人機(jī)接收數(shù)據(jù)的丟包率,同時(shí)對(duì)田間的通信路徑損耗進(jìn)行估測(cè),以更精確地計(jì)算無(wú)人機(jī)懸停時(shí)間,利用2個(gè)DRF1609H型Zigbee模塊測(cè)試無(wú)人機(jī)接收模塊與地面數(shù)據(jù)發(fā)送模塊之間的丟包情況和接收信號(hào)強(qiáng)度值,其中地面數(shù)據(jù)發(fā)送模塊由Zigbee通信模塊與單片機(jī)STM32組成,DRF1609H型Zigbee模塊的發(fā)送功率為22 dbm。
如圖11所示,無(wú)人機(jī)從地面節(jié)點(diǎn)正上方開(kāi)始,每隔10 m水平距離,到與節(jié)點(diǎn)水平距離120m為止,當(dāng)?shù)孛婀?jié)點(diǎn)接收到無(wú)人機(jī)的數(shù)據(jù)請(qǐng)求后,發(fā)送200個(gè)數(shù)據(jù)包,每個(gè)數(shù)據(jù)包的長(zhǎng)度為0.5 kB,丟包率測(cè)試結(jié)果如表3所示。接收信號(hào)強(qiáng)度值取每個(gè)位置采集10次數(shù)據(jù)的接收信號(hào)強(qiáng)度平均值。
圖11 田間試驗(yàn)
從表3中可以看出,在水平通信距離80 m以外,隨著測(cè)試距離的增加,丟包率增加,80 m以?xún)?nèi)范圍滿(mǎn)足數(shù)據(jù)傳輸穩(wěn)定性要求。
表3 水平通信距離對(duì)丟包率的影響
采用MATLAB中的曲線擬合方法對(duì)路徑損耗模型進(jìn)行評(píng)估,確定水平通信距離與接收信號(hào)強(qiáng)度值之間的關(guān)系。如圖12所示,經(jīng)過(guò)多次試驗(yàn)發(fā)現(xiàn),指數(shù)函數(shù)的擬合效果最佳,田間ZigBee傳輸?shù)慕邮招盘?hào)強(qiáng)度值RSSI與水平通信距離的關(guān)系為
為了進(jìn)一步評(píng)估所提算法對(duì)無(wú)人機(jī)能耗優(yōu)化的效果,在無(wú)人機(jī)上配備電流計(jì)模塊獲取瞬時(shí)電流并計(jì)算無(wú)人機(jī)能耗。首先測(cè)試無(wú)人機(jī)的飛行功率和懸停功率,測(cè)試方案如下:無(wú)人機(jī)以5 m/s速度和5 m高度進(jìn)行勻速直線飛行100 m,利用每隔0.1 s獲取的瞬時(shí)電流和電壓計(jì)算無(wú)人機(jī)的飛行功率。同理,計(jì)算單位時(shí)間內(nèi)無(wú)人機(jī)的懸停能耗。經(jīng)過(guò)實(shí)際測(cè)試,本試驗(yàn)所采用的四旋翼無(wú)人機(jī)的平均飛行功率為746.38 W,平均懸停功率為771.86 W,無(wú)人機(jī)電池的最大容量為22 000 mAh。
最后,在田間隨機(jī)布置11個(gè)Zigbee模塊模擬節(jié)點(diǎn),結(jié)合田間路徑損耗模型,運(yùn)行算法得到DDRL算法及PSO-TSP算法規(guī)劃的懸停點(diǎn)、懸停時(shí)間及采集順序,然后令無(wú)人機(jī)按照規(guī)劃方案進(jìn)行數(shù)據(jù)采集試驗(yàn)。設(shè)置節(jié)點(diǎn)數(shù)據(jù)量為160 kB,數(shù)據(jù)包長(zhǎng)度為0.5 kB,無(wú)人機(jī)飛行高度5 m,飛行速度5 m/s。圖13a為PSO-TSP算法的飛行路徑;圖 13b為本文算法DDRL優(yōu)化的懸停點(diǎn)和飛行路徑。最終的試驗(yàn)結(jié)果表明,DDRL算法的無(wú)人機(jī)總能耗為354.56 kJ,飛行距離為1 189.23 m,丟包率為0.28%;PSO-TSP算法的總能耗為400.83 kJ,飛行距離為1 556.21 m,丟包率為0.15%。相比PSO-TSP,DDRL算法的總能耗減少了11.5%,飛行路徑減少了366.98 m;DDRL算法的丟包率略高于PSO-TSP算法,這是因?yàn)镈DRL算法的規(guī)劃方案中無(wú)人機(jī)懸停采集點(diǎn)不在節(jié)點(diǎn)正上方,且在一個(gè)懸停點(diǎn)采集多個(gè)節(jié)點(diǎn)數(shù)據(jù),而PSO-TSP算法的規(guī)劃方案中無(wú)人機(jī)懸停在節(jié)點(diǎn)正上方采集數(shù)據(jù),丟包概率降低。
圖13 PSO-TSP和DDRL算法無(wú)人機(jī)懸停點(diǎn)及數(shù)據(jù)采集順序示意圖
利用無(wú)人機(jī)采集田間無(wú)線傳感器節(jié)點(diǎn)數(shù)據(jù),能克服農(nóng)田無(wú)網(wǎng)絡(luò)基礎(chǔ)設(shè)施及節(jié)點(diǎn)多跳轉(zhuǎn)發(fā)數(shù)據(jù)耗電快、網(wǎng)絡(luò)生命周期短的問(wèn)題。本研究將節(jié)點(diǎn)采集的情況分為感知冗余覆蓋下部分節(jié)點(diǎn)數(shù)據(jù)采集及全節(jié)點(diǎn)數(shù)據(jù)采集兩個(gè)場(chǎng)景,基于深度強(qiáng)化學(xué)習(xí)規(guī)劃無(wú)人機(jī)的節(jié)點(diǎn)選擇、懸停位置及采集順序,以節(jié)省無(wú)人機(jī)能耗縮短任務(wù)完成時(shí)間。感知冗余覆蓋下部分節(jié)點(diǎn)數(shù)據(jù)采集方案適用于節(jié)點(diǎn)間冗余覆蓋面積比例比較高,無(wú)人機(jī)能量無(wú)法完成所有節(jié)點(diǎn)的數(shù)據(jù)采集任務(wù),以及對(duì)數(shù)據(jù)的完整性要求不高的應(yīng)用場(chǎng)景;全節(jié)點(diǎn)數(shù)據(jù)采集方案適用于對(duì)數(shù)據(jù)有完整性要求的應(yīng)用場(chǎng)景。通過(guò)對(duì)研究結(jié)果的分析,得到以下結(jié)論:
1)在考慮感知冗余覆蓋下部分節(jié)點(diǎn)數(shù)據(jù)采集場(chǎng)景中,采用競(jìng)爭(zhēng)雙重深度Q網(wǎng)絡(luò)(DDDQN)算法選擇采集節(jié)點(diǎn)及規(guī)劃采集順序,提高了無(wú)人機(jī)的能效和減少了冗余數(shù)據(jù)的采集。仿真驗(yàn)證了在相同配置下,DDDQN算法比DQN算法的覆蓋率和平均能耗更優(yōu),算法性能更加穩(wěn)定;在相同的覆蓋率要求下,DDDQN算法比DQN算法的飛行距離縮短了1.21 km,能耗減少了27.9%。
2)在全節(jié)點(diǎn)數(shù)據(jù)采集場(chǎng)景中,提出了兩級(jí)深度強(qiáng)化學(xué)習(xí)(DDRL)算法對(duì)無(wú)人機(jī)的懸停位置及采集順序進(jìn)行優(yōu)化,減少了無(wú)人機(jī)完成任務(wù)時(shí)的總能耗。本文從傳感器不同數(shù)據(jù)負(fù)載量、無(wú)人機(jī)飛行速度、傳感器節(jié)點(diǎn)數(shù)量對(duì)DDRL與PSO-TSP及MEFC算法的總能耗、總時(shí)間、飛行能耗占比、懸停采集時(shí)間進(jìn)行仿真對(duì)比,結(jié)果證明無(wú)人機(jī)采用DDRL算法采集數(shù)據(jù)的總能耗最低。最后,通過(guò)田間試驗(yàn)測(cè)試了四旋翼無(wú)人機(jī)的飛行功率與懸停功率,并對(duì)DDRL算法及經(jīng)典PSO-TSP算法的采集方案進(jìn)行了實(shí)際田間飛行試驗(yàn)。結(jié)果表明,DDRL算法能同時(shí)考慮飛行距離與數(shù)據(jù)傳輸時(shí)間兩個(gè)因素,在采集相同的數(shù)據(jù)情況下較PSO-TSP算法能耗降低了11.5%。
[1] García L, Parra L, Jimenez J M, et al. DronAway: A proposal on the use of remote sensing drones as mobile gateway for WSN in precision agriculture[J]. Applied Sciences, 2020, 10(19): 6668.
[2] 宋成寶,柳平增,劉興華,等. 基于HSIC的日光溫室溫度傳感器優(yōu)化配置策略[J]. 農(nóng)業(yè)工程學(xué)報(bào),2022, 38(8):200-207.
Song Chengbao, Liu Pingzeng, Liu Xinghua, et al. Optimal configuration strategy for temperature sensors in solar greenhouse based on HSIC[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(8): 200-207. (in Chinese with English abstract)
[3] Bandur D, Jaksic B, Bandur M, et al. An analysis of energy efficiency in Wireless Sensor Networks (WSNs) applied in smart agriculture[J]. Computers and Electronics in Agriculture, 2019, 156: 500-507.
[4] Polo J, Hornero G, Duijneveld C, et al. Design of a low-cost wireless sensor network with UAV mobile node for agricultural applications[J]. Computers and Electronics in Agriculture, 2015, 119: 19-32.
[5] Zhang B T, Meng L Y. Energy efficiency analysis of wireless sensor networks in precision agriculture economy[J]. Scientific Programming, 2021, 2021: 8346708.
[6] Huang S C, Chang H Y. A farmland multimedia data collection method using mobile sink for wireless sensor networks[J]. Multimedia Tools and Applications, 2017, 76(19): 19463-19478.
[7] Singh P K, Sharma A. An intelligent WSN-UAV-based IoT framework for precision agriculture application[J]. Computers and Electrical Engineering, 2022, 100: 107912.
[8] Yemeni Z, Wang H, Ismael W M, et al. Reliable spatial and temporal data redundancy reduction approach for WSN[J]. Computer Networks, 2021, 185: 107701.
[9] Kumar S, Chaurasiya V K. A strategy for elimination of data redundancy in internet of things (IoT) based wireless sensor network(WSN)[J]. IEEE Systems Journal, 2018, 13(2): 1650-1657.
[10] Rezende J D V, da Silva R I, Souza M J F. Gathering big data in wireless sensor networks by drone(dagger)[J]. Sensors, 2020, 20(23): 6954.
[11] Luo C W, Chen W P, Li D Y, et al. Optimizing flight trajectory of UAV for efficient data collection in wireless sensor networks[J]. Theoretical Computer Science, 2021, 853: 25-42.
[12] Ben Ghorbel M, Rodríguez-Duarte D, Ghazzai H, et al. Joint position and travel path optimization for energy efficient wireless data gathering using unmanned aerial vehicles[J]. IEEE Transactions on Vehicular Technology, 2019, 68(3): 2165-2175.
[13] Gong J, Chang T H, Shen C, et al. Flight time minimization of UAV for data collection over wireless sensor networks[J]. IEEE Journal on Selected Areas in Communications, 2018, 36(9): 1942-1954.
[14] Just G E, Pellenz M E, Lima L A D, et al. UAV path optimization for precision agriculture wireless sensor networks[J]. Sensors, 2020, 20(21): 6098.
[15] Zhang Y, Mou Z Y, Gao F F, et al. Hierarchical deep reinforcement learning for backscattering data collection with multiple UAVs[J]. IEEE Internet of Things Journal, 2021, 8(5): 3786-3800.
[16] 蔣寶慶,陳宏濱. 基于Q學(xué)習(xí)的無(wú)人機(jī)輔助WSN數(shù)據(jù)采集軌跡規(guī)劃[J]. 計(jì)算機(jī)工程,2021,47(4):127-134.
Jiang Baoqin, Chen Hongbin. Trajectory planning for unmanned aerial vehicle assisted WSN data collection based on Q-Learning[J]. Computer Engineering, 2021, 47(4): 127-134. (in Chinese with English abstract)
[17] Yi M J, Wang X J, Liu J, et al. Deep reinforcement learning for fresh data collection in UAV-assisted IoT networks[C]//IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). Toronto, ON, Canada: IEEE, 2020: 716-721.
[18] 付澍,楊祥月,張海君,等. 物聯(lián)網(wǎng)數(shù)據(jù)收集中無(wú)人機(jī)路徑智能規(guī)劃[J]. 通信學(xué)報(bào),2021,42(2):124-133.
Fu Shu, Yang Xiangyue, Zhang Haijun, et al. UAV path intelligent planning in IoT data collection[J]. Journal on Communications, 2021, 42(2): 124-133. (in Chinese with English abstract)
[19] Zeng Y, Xu J, Zhang R. Energy minimization for wireless communication with Rotary-Wing UAV[J]. IEEE Transactions on Wireless Communications, 2019, 18(4): 2329-2345.
[20] Padakandla S. A survey of reinforcement learning algorithms for dynamically varying environments[J]. ACM Computing Surveys, 2021, 54(6): 127.
[21] 陳佳盼,鄭敏華. 基于深度強(qiáng)化學(xué)習(xí)的機(jī)器人操作行為研究綜述[J]. 機(jī)器人,2022,44(2):236-256.
Chen Jiapan, Zheng Minhua. A survey of robot manipulation behavior research based on deep reinforcement learning[J]. Robet, 2022, 44(2): 236-256. (in Chinese with English abstract)
[22] Fenjiro Y, Benbrahim H. Deep reinforcement learning overview of the state of the art[J]. Journal of Automation, Mobile Robotics and Intelligent Systems, 2018, 12: 20-39.
[23] 張自東,邱才明,張東霞,等. 基于深度強(qiáng)化學(xué)習(xí)的微電網(wǎng)復(fù)合儲(chǔ)能協(xié)調(diào)控制方法[J]. 電網(wǎng)技術(shù),2019,43(6):1914-1921.
Zhang Zidong, Qiu Caiming, Zhang Dongxia, et al. A coordinated control method for hybrid energy storage system in microgrid based on deep reinforcement learning[J]. Power System Technology, 2019, 43(6): 1914-1921. (in Chinese with English abstract)
[24] Zhang W Y, Gai J Y, Zhang Z G, et al. Double-DQN based path smoothing and tracking control method for robotic vehicle navigation[J]. Computers and Electronics in Agriculture, 2019, 166: 104985.
[25] Wang Z Y, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning[C]. //In Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, NY, USA: ICML, 2016: 1995-2003.
[26] Kumar H, Mammen P M, Ramamritham K. Explainable AI: deep reinforcement learning agents for residential demand side cost savings in smart grids[J]. arXiv e-prints, 2019: 1910. 08719.
[27] 趙銘慧,張雪波,郭憲,等. 基于分層強(qiáng)化學(xué)習(xí)的通用裝配序列規(guī)劃算法[J]. 控制與決策,2022,37(4):861-870.
Zhao Minghui, Zhang Xuebo, Guo Xian, et al. A general assembly sequence planning algorithm based on hierarchical reinforcement learning[J], Control and Decision, 2022, 37(4): 861-870. (in Chinese with English abstract)
[28] 楊惟軼,白辰甲,蔡超,等. 深度強(qiáng)化學(xué)習(xí)中稀疏獎(jiǎng)勵(lì)問(wèn)題研究綜述[J]. 計(jì)算機(jī)科學(xué),2020,47(3):182-191.
Yang Weiyi, Bai Chenjia, Cai Chao, et al. Survey on sparse reward in deep reinforcement learning[J]. Computer Science. 2020, 47(3): 182-191. (in Chinese with English abstract)
[29] Hernandez-Garcia J F, Sutton R S. Understanding multi-step deep reinforcement learning: A systematic study of the DQN target[J]. arXiv e-prints, 2019:1901. 07510.
[30] Chen J, Ye F, Li Y B. Travelling salesman problem for UAV path planning with two parallel optimization algorithms[C]//2017 Progress in Electromagnetics Research Symposium-Fall(PIERS-FALL). Singapore, 2017: 832-837.
[31] 吳媚. 工業(yè)物聯(lián)網(wǎng)環(huán)境下面向能耗優(yōu)化的無(wú)人機(jī)飛行規(guī)劃系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[D]. 南京:東南大學(xué),2019.
Wu Mei. Energy Efficient UAV Flight Planning System for the Industrial IoT Environment[D]. Nanjing:Southeast University, 2019. (in Chinese with English abstract)
UAV collection methods for the farmland nodes data based on deep reinforcement learning
Hu Jie1,3, Zhang Yali2,3※, Wang Tuan1, Wang Mengcheng1, Lan Yubin1,3, Zhang Zhixun2
(1.,,510642,; 2.,,510642,; 3.,510642,)
Unmanned Aerial Vehicle (UAV) has been widely used to collect data from the wireless sensor node in fields. Some problems can be solved in this case, such as no network infrastructure in farmland, fast power consumption of multi-hop data forwarding, premature death of nodes near the gateway, and shortened network life cycle. However, the multiple nodes overlapping can often occur during UAVs collection at the same time, due to the possible redundancy of adjacent sensor data. In this study, a UAV data collection method was proposed to plan the node selection, hovering position, and collecting order using improved deep reinforcement learning. The UAV data collection from the sensor nodes was then divided into two scenarios: data collection from the partial nodes under perceptual redundancy coverage, and data collection from all nodes. The optimization was made to save the UAV energy consumption in less mission completion time. The data collection of partial nodes under perceived redundancy coverage was suitable for the relatively high proportion of redundant coverage area among nodes. The UAV energy also failed to complete the data collection tasks of all nodes, indicating the low requirements of data integrity. By contrast, the all-node data collection fully met the high requirement of data integrity. In the scenario of partial node data collection with perceived redundant coverage, the Dueling Double Deep Q Network (DDDQN) was used to select the collection nodes and then plan the collecting order, indicating the high energy efficiency of the UAV with the less redundant data. Simulation results show that the DDDQN presented greater data coverage and lower effective coverage average energy consumption than the Deep Q Network (DQN) under the same configuration. The training process of DDDQN was more stable than that of DQN, particularly for the higher returns at the end of learning. More importantly, the flight distance and energy consumption of the DDDQN were reduced by 1.21 km, and 27.9%, respectively, compared with the DQN. In the scenario of all-node data collection, a Double Deep Reinforcement Learning (DDRL) was proposed to optimize the hovering position and UAV collection sequence, in order to minimize the total energy consumption of the UAV during data collection. A comparison was made on the DDRL with the classical PSO-TSP and MEFC. A systematic evaluation was made to clarify the impact of the UAV flight speed on the total energy consumption and total working time, the impact of different node data loads on the UAV energy consumption, the impact of different flight speeds on the UAV hover collection time, and the impact of the number of sensor nodes on the total energy consumption. The simulation results show that the total energy consumption of the improved model was at least 6.3% less than that of the classical PSO-based Travel Salesman Problem (PSO-TSP), and the Minimized Energy Flight Control (MEFC) under different node numbers and UAV flight speeds, especially at the data load of a single node less than 160 kB. Finally, the flight and hover powers of the quadrotor UAV were tested to determine the packet loss rate and received signal strength of the UAV in the field experiments. The actual field flight experiments were carried out on the DDRL and the data collection of the classical PSO-TSP. Field experiment results show that the DDRL-based data collection was reduced by 11.5% for the total energy consumption of UAV, compared with the PSO-TSP. The DDDQN and DDRL approaches can be expected to provide the optimal energy consumption for the UAVs' data collection of wireless sensor nodes in the field.
UAV; deep reinforcement learning; node data collection; perceptual redundancy; DQN; DRL
10.11975/j.issn.1002-6819.2022.22.005
S126;S-3
A
1002-6819(2022)-22-0041-11
胡潔,張亞莉,王團(tuán),等. 基于深度強(qiáng)化學(xué)習(xí)的農(nóng)田節(jié)點(diǎn)數(shù)據(jù)無(wú)人機(jī)采集方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2022,38(22):41-51.doi:10.11975/j.issn.1002-6819.2022.22.005 http://www.tcsae.org
Hu Jie, Zhang Yali, Wang Tuan, et al. UAV collection methods for the farmland nodes data based on deep reinforcement learning[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(22): 41-51. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2022.22.005 http://www.tcsae.org
2022-07-21
2022-10-27
高等學(xué)校學(xué)科創(chuàng)新引智計(jì)劃項(xiàng)目(D18019);國(guó)家自然科學(xué)基金項(xiàng)目(32271997);廣東省重點(diǎn)領(lǐng)域研發(fā)計(jì)劃項(xiàng)目(2019B020221001);廣東省科技計(jì)劃項(xiàng)目(2018A050506073)
胡潔,博士,副教授,研究方向?yàn)檗r(nóng)業(yè)人工智能,農(nóng)業(yè)物聯(lián)網(wǎng)。Email:hjgz79@scau.edu.cn
張亞莉,博士,副教授,研究方向?yàn)檗r(nóng)業(yè)航空傳感器技術(shù)與農(nóng)產(chǎn)品產(chǎn)地環(huán)境監(jiān)測(cè)。Email:ylzhang@scau.edu.cn
農(nóng)業(yè)工程學(xué)報(bào)2022年22期