王燕軍
摘 ?要: 分布式路由算法廣泛應(yīng)用于認(rèn)知無線電網(wǎng)絡(luò)(CRNs)。為此,分析多跳CRNs的路由問題,利用無中心的Markov決策過程(DEC?POMDP)建立問題模型,并確保次級用戶對主級用戶的干擾少于預(yù)定閾值,進(jìn)而控制端到端時延。最后引用多智能體學(xué)習(xí)算法解決此問題模型,進(jìn)而形成基于多智能體學(xué)習(xí)的路由(MALR)。實驗結(jié)果表明,提出的路由能夠控制時延,并降低了干擾率。
關(guān)鍵詞: 認(rèn)知無線電網(wǎng)絡(luò); MALR; Markov決策過程; 干擾降低; 多智能體學(xué)習(xí); 時延控制
中圖分類號: TN915.04?34; TP393 ? ? ? ? ? ? ? ? ? ?文獻(xiàn)標(biāo)識碼: A ? ? ? ? ? ? ? ? 文章編號: 1004?373X(2019)19?0023?05
Abstract: The distributed routing algorithm is widely used in cognitive radio networks (CRNs). The distributed cooperative multi?agent routing problem in multi?hop CRNs is analyzed. The decentralized partially observable Markov decision process (DEC?POMDP) is used to establish the problem model, which can guarantee that the interference from secondary user to primary user is lower than the predefined threshold, and control the end?to?end delay. The multi?agent learning algorithm is introduced to deal with the problem model, so as to form the multi?agent learning?based routing (MALR). The experimental results show that the proposed routing can control the delay and reduce interference probability.
Keywords: cognitive ratio network; MALR; Markov decision process; interference reduction; multi?agent learning; delay control
隨著無線應(yīng)用業(yè)務(wù)的拓展,對無線頻譜要求越來越高。當(dāng)頻譜是空閑時,注冊用戶(也稱為主用戶,Primary Users,PUs)具有頻譜優(yōu)先接入權(quán)。認(rèn)知無線電網(wǎng)絡(luò)(Cognitive Radio Networks,CRNs)是解決注冊頻譜的重新使用問題[1?2]。在CRNs網(wǎng)絡(luò)內(nèi),在不干擾PUs用戶傳輸?shù)臈l件下,次級用戶(Secondary Users,SUs)可以接入已注冊頻譜。與傳統(tǒng)無線網(wǎng)絡(luò)類似,CRN存在集中網(wǎng)絡(luò)或分布式(自組網(wǎng)絡(luò))形式。在集中網(wǎng)絡(luò)中,單一基站提供頻譜接入和SUs的單跳通信。在分布式網(wǎng)絡(luò)中,SUs能夠與網(wǎng)絡(luò)內(nèi)其他用戶以多跳方式進(jìn)行通信。與傳統(tǒng)的多跳無線網(wǎng)絡(luò)不同,CRNs中的路由設(shè)計存在挑戰(zhàn),在設(shè)計CRNs路由時需要考慮多個因素。首先,路由協(xié)議應(yīng)考慮PUs活動的真實模型。其次,CRNs具有分布式特性。由于SUs不可能使用共同控制信道接收關(guān)于網(wǎng)絡(luò)的分布式信息,僅使用局部信息決策路由,所以路由必須具有分布特性。第三,SUs流量的路由性能嚴(yán)重受到CRNs環(huán)境因素的影響,特別是PUs的活動狀態(tài)和其他SUs的流量。因此,應(yīng)著重考慮CRNs快速環(huán)境變化[3?6]。為此,本文考慮分布式協(xié)作多代理的CRNs路由問題。此問題的約束就是因SUs傳輸導(dǎo)致的PUs的數(shù)據(jù)包丟失數(shù)必須少于預(yù)定閾值。為此,利用馬爾可夫調(diào)制泊松過程(Markov Modulated Poisson Process,MMPP)模擬PUs活動,建立問題模型,再引用多智能體學(xué)習(xí)求解,從而建立穩(wěn)定路由。實驗數(shù)據(jù)表明,提出的MALR(Multi?Agent Learning?Based Routing)路由能夠有效地降低時延,并控制干擾率。
1.1 ?系統(tǒng)模型
本文針對認(rèn)知無線電網(wǎng)絡(luò)的路由問題展開分析,并提出基于多智能體學(xué)習(xí)路由MALR。首先利用Markov決策過程建立問題模型,再利用多智能體學(xué)習(xí)算法解決路由問題,從而保證數(shù)據(jù)快速傳輸,并控制對其他鏈路的干擾。實驗數(shù)據(jù)表明,與FPLA和OPERA算法相比,提出的MALR路由減少了傳輸時延,也降低了干擾率。
參考文獻(xiàn)
[1] ABDELAZIZ S, ELNAINAY M. Metric?based taxonomy of rou?ting protocols for cognitive radio Ad Hoc networks [J]. Journal of network and computer applications, 2014, 40(3): 151?163.
[2] AI?RAWI H A A, YAN K L A, MOHAMD H, et al. A reinforcement learning?based routing scheme for cognitive radio Ad Hoc networks [C]// Proceedings of 2014 IFIP Wireless and Mobile Networking Conference. Vilamoura: IEEE, 2014: 1?8.
[3] BARVE S, KULKARNI P. Multi?agent reinforcement learning based opportunistic routing and channel assignment for mobile cognitive radio Ad Hoc network [J]. Mobile networks and applications, 2014, 19(6): 720?730.
[4] 沈艷霞,薛小松.無線傳感網(wǎng)絡(luò)移動信標(biāo)節(jié)點路徑優(yōu)化策略[J].傳感器與微系統(tǒng),2012,31(12):42?46.
SHEN Yanxia, XUE Xiaosong. Path optimization strategy of WSNs mobile beacon nodes [J]. Transducer and microsystem technologies, 2012, 31(12): 42?46
[5] 陳友榮,王章權(quán),程菊花,等.基于最短路徑樹的優(yōu)化生存時間路由算法[J].傳感技術(shù)學(xué)報,2012,25(3):406?413.
CHEN Yourong, WANG Zhangquan, CHENG Juhua, et al. Lifetime optimized routing algorithm based on shortest path tree [J]. Chinese journal of sensors and actuators, 2012, 25(3): 406?413.
[6] CALEFFI M, AKYILDIZ I F. OPERA: optimal routing metric for cognitive radio Ad Hoc networks [J]. IEEE transactions on wireless communication, 2012, 11(5): 2884?2894.
[7] CHU S C A, ALFA A S. A model for bursty PU channel and its impact on the study of cognitive radio networks [C]// Proceedings of 2013 International Wireless Communications and Mobile Computing Conference. Sardinia: IEEE, 2013: 461?466.
[8] DING L, MELODIA T, BATALAMA N. Distributed resource allocation in cognitive and cooperative Ad Hoc networks through joint routing, relay selection and spectrum allocation [J]. Computer networks, 2015, 83(3): 315?331.
[9] EL?SHERIF A A, MOHAMED A. Joint routing and resource allocation for delay minimization in cognitive radio based mesh networks [J]. IEEE transactions on wireless communication, 2014, 13(5): 186?197.
[10] GRONDMAN I, BUSONIU L. A survey of actor?critic reinforcement learning: standard and natural policy gradients [J]. IEEE transactions on systems, man and cybernetics, 2012, 42(6): 1291?1307.
[11] KAE W C, HOSSAIN E. Estimation of primary user parameters in cognitive radio systems via hidden Markov model [J]. IEEE transactions on signal processing, 2013, 61(7): 782?795.
[12] LIANG Q, WANG X, TIAN X. Two?dimensional route swit?ching in cognitive radio networks: a game?theoretical framework [J]. IEEE/ACM transactions on networking, 2015, 2(23): 1053?1066.
[13] PING S, AIJAZ A. SACRP: a spectrum aggregation?based cooperative routing protocol for cognitive radio Ad?Hoc networks [J]. IEEE transactions on communications, 2015, 63(8): 2015?2030.
[14] ZHU Quanyan, YUAN Zhou, SONG Jubin, et al. Interfe?rence aware routing game for cognitive radio multi?hop networks [J]. IEEE journal on selected areas in communications, 2012, 30(10): 2006?2015.