摘要: 為了科學(xué)準(zhǔn)確地預(yù)測(cè)膜下滴灌棉花蒸散量,基于鯨魚優(yōu)化算法(whale optimization algorithm,WOA)和極端梯度提升樹(XGBoost),提出了WOA-XGBoost棉花蒸散量預(yù)測(cè)模型.采用最大互信息系數(shù)(maximal information coefficient,MIC)篩選影響棉花蒸散量的關(guān)鍵因素,依據(jù)相關(guān)系數(shù)排序構(gòu)建輸入組合,代入WOA-XGBoost模型進(jìn)行模擬.并與XGBoost,SVM,WOA-SVM和PSO-XGBoost預(yù)測(cè)結(jié)果進(jìn)行對(duì)比驗(yàn)證.結(jié)果表明:太陽輻射、最低氣溫、最高氣溫、相對(duì)濕度、風(fēng)速和土壤溫度與棉花蒸散量相關(guān)性較大,其MIC值分別為0.722,0.546,0.496,0.475,0.379和0.219,基于上述6個(gè)因素構(gòu)建的WOA-XGBoost模型綜合性能最優(yōu),R2, MAE, RMSE和MAPE分別為0.922,0.038 mm/h,0.064 mm/h和0.221,預(yù)測(cè)精度均優(yōu)于相同輸入?yún)?shù)下的其他4種模型.因此,推薦使用WOA-XGBoost模型模擬相關(guān)因素與膜下滴灌棉花蒸散量之間的非線性關(guān)系.研究可為精確計(jì)算膜下滴灌棉花蒸散量提供科學(xué)依據(jù),為灌溉決策優(yōu)化提供參考.
關(guān)鍵詞: 蒸散量;棉花;極端梯度提升樹模型;鯨魚優(yōu)化算法;預(yù)測(cè)模型
中圖分類號(hào): S274.4 文獻(xiàn)標(biāo)志碼: A 文章編號(hào): 1674-8530(2024)12-1280-07
DOI:10.3969/j.issn.1674-8530.23.0195
曹緣,王振華,張繼紅,等. 基于WOA-XGBoost的膜下滴灌棉花蒸散量預(yù)測(cè)模型[J]. 排灌機(jī)械工程學(xué)報(bào),2024,42(12):1280-1286.
CAO Yuan, WANG Zhenhua, ZHANG Jihong,et al. Evapotranspiration prediction model of cotton under film drip irrigation based on WOA-XGBoost[J]. Journal of drainage and irrigation machinery engineering(JDIME), 2024, 42(12): 1280-1286. (in Chinese)
Evapotranspiration prediction model of cotton under film
drip irrigation based on WOA-XGBoost
CAO Yuan1,2,3, WANG Zhenhua1,2,3*, ZHANG Jihong1,2,3, LIU Ningning1,2,3,
LI Wenhao1,2,3, ZHANG Jinzhu1,2,3
(1. College of Water Conservancy amp; Architectural Engineering, Shihezi University, Shihezi, Xinjiang 832000, China;" 2. Key Laboratory of Modern Water-saving Irrigation of Xinjiang Production amp; Construction Group, Shihezi, Xinjiang 832000, China;" 3." Key Laboratory of Northwest Oasis Water-saving Agriculture, Ministry of Agriculture and Rural Affairs, PR China, Shihezi, Xinjiang 832000, China)
Abstract: To accurately predict cotton evapotranspiration under mulched drip irrigation, a WOA-XGBoost cotton evapotranspiration prediction model based on the whale optimization algorithm (WOA) and the extreme gradient boosting tree (XGBoost) was proposed. The maximal mutual information coefficient (MIC) was utilized to identify the key factors impacting cotton evapotranspiration, and the input combinations were formulated based on the order of correlation coefficients and inputted into the WOA-XGBoost model for simulation. The prediction results were compared and verified with those obtained from XGBoost, SVM, WOA-SVM, and PSO-XGBoost modes. The results show that solar radiation, minimum and maximum air temperatures, relative humidity, wind speed, and soil temperature are highly correlated with cotton evapotranspiration, exhibiting MIC values of 0.722, 0.546, 0.496, 0.475, 0.379, and 0.219, respectively. The WOA-XGBoost model constructed on the basis of the above six factors has the best overall performance with R2, MAE, RMSE, and MAPE of 0.922, 0.038 mm/h, 0.064 mm/h and 0.221, respectively. The predictive accuracy surpass the other four models utilizing the same input parameters. Therefore, it is recommended to use the WOA-XGBoost model to simulate the non-linear relationship between the relevant factors and the evapotranspiration of cotton under film drip irrigation. This study offers a scientific foundation for accurately calculating cotton evapotranspiration under mulched drip irrigation and serves as a reference for optimizing irrigation decisions.
Key words: evapotranspiration;cotton;extreme gradient boosting tree model;whale optimization algorithm;prediction model
棉花是新疆主要的種植作物之一,種植面積和產(chǎn)量位居全國榜首,在經(jīng)濟(jì)上占有重要地位[1].新疆位于西北干旱區(qū),蒸發(fā)強(qiáng)烈,降水稀少.棉花灌溉主要依靠人工經(jīng)驗(yàn)控制灌溉時(shí)間和灌溉水量,缺乏對(duì)作物實(shí)際需水量的精準(zhǔn)控制[2-3].參考蒸散量是作物需水量的關(guān)鍵參數(shù),也是確定灌溉用水定額的基礎(chǔ)[4].目前用于作物蒸散量估算的方法包括儀器測(cè)量、經(jīng)驗(yàn)公式和機(jī)器學(xué)習(xí)方法.專業(yè)儀器精度高,但成本高昂[5].Penman-Monteith等經(jīng)驗(yàn)公式依賴高度集成的氣象數(shù)據(jù),采集條件和數(shù)據(jù)質(zhì)量限制了其適用性[6].隨著機(jī)器學(xué)習(xí)算法的發(fā)展,一些學(xué)者將其應(yīng)用于蒸散發(fā)模擬中,包括支持向量機(jī)(support vector machine,SVM)[7]、梯度提升回歸模型(gradient boosting decision tree,GBDT)[8]、極限學(xué)習(xí)機(jī)等[9].其中GBDT模型基于Boosting思想,表達(dá)能力強(qiáng),能夠很好地處理特征因子間的關(guān)系.FAN等[10]使用中國8個(gè)氣象站數(shù)據(jù)進(jìn)行回歸預(yù)測(cè),結(jié)果顯示,GBDT預(yù)測(cè)值和Penman-Monteith計(jì)算值之間有良好的相關(guān)性,且GBDT計(jì)算成本低于SVM.但是,GBDT是通過減少模型偏差提高性能,容易陷入過擬合.與GBDT相比,XGBoost在目標(biāo)函數(shù)中加入了正則項(xiàng),引入了葉節(jié)點(diǎn)權(quán)重來避免過擬合,從而控制模型的復(fù)雜度,提高泛化性能[11].
在XGBoost的建模過程中,訓(xùn)練樣本的選擇和超參數(shù)的優(yōu)化是核心步驟.研究發(fā)現(xiàn)結(jié)合群體智能優(yōu)化算法可以顯著提高模型的性能,SAGGI等[12]將模糊遺傳和正則化隨機(jī)森林相結(jié)合,成功預(yù)測(cè)了玉米、小麥的作物系數(shù)和日蒸散量.LIU等[13]將極限學(xué)習(xí)機(jī)和遺傳算法相結(jié)合來估計(jì)中國西南地區(qū)的參考蒸散量,與傳統(tǒng)極限學(xué)習(xí)機(jī)模型相比,混合模型擁有更好的預(yù)測(cè)精度.然而遺傳算法缺乏網(wǎng)絡(luò)反饋從而導(dǎo)致搜索速度慢、迭代時(shí)間長,容易“早熟”.鯨魚優(yōu)化算法學(xué)習(xí)了鯨魚捕食方式的優(yōu)點(diǎn),尋優(yōu)收斂速度快,全局搜索能力強(qiáng),將其應(yīng)用于預(yù)測(cè)模型的參數(shù)尋優(yōu)中,能夠避免人工經(jīng)驗(yàn)造成的效果偏差[14].
目前XGBoost模型在檢測(cè)、分類等研究領(lǐng)域均取得了不錯(cuò)的研究成果[15-16],但是將XGBoost模型應(yīng)用于蒸散量預(yù)測(cè)領(lǐng)域的研究尚不多見.大多數(shù)蒸散量模擬研究?jī)H以氣象數(shù)據(jù)作為輸入特征,多源數(shù)據(jù)對(duì)蒸散量的綜合影響研究仍較為匱乏.文中將WOA與XGBoost模型相結(jié)合,通過最大互信息系數(shù)探索各環(huán)境因素對(duì)膜下滴灌棉花蒸散量的影響程度,綜合模擬精度和預(yù)測(cè)代價(jià)評(píng)價(jià)WOA-XGBoost模型的可行性.并與XGBoost,SVM,WOA-SVM和粒子群算法(particle swarm optimization,PSO)優(yōu)化的XGBoost模型的預(yù)測(cè)結(jié)果相比較,研究結(jié)果可為棉花灌溉決策方面提供一定的技術(shù)指導(dǎo).
1 材料與方法
1.1 試驗(yàn)數(shù)據(jù)獲取
試驗(yàn)于2019年4月20日在節(jié)水灌溉試驗(yàn)站(85°59′E,44°19′N,海拔412 m)開展,試驗(yàn)地屬于中溫帶大陸性干旱氣候.棉花覆膜種植在型號(hào)為QYZS-2010的大型稱重式蒸滲儀中,種植方式為“一膜三管六行”,膜寬為2.05 m,該蒸滲儀規(guī)格為2 m×2 m×2 m,配有機(jī)械秤體、高精度稱重系統(tǒng)和數(shù)據(jù)采集器,系統(tǒng)的蒸發(fā)量變化靈敏度≤0.02 mm,每隔1 h采集1次蒸散量.氣象數(shù)據(jù)源自站內(nèi)型號(hào)為DAVIS 6152C的小型氣象站,數(shù)據(jù)采集間隔為5 min.主要數(shù)據(jù)有最高氣溫(Tmax)、最低氣溫(Tmin)、氣壓(AP)、相對(duì)濕度(RH)、太陽輻射(Rs)、風(fēng)速(u2)、土壤含水量(SWC)、土壤溫度(ST)等.
1.2 數(shù)據(jù)預(yù)處理
研究采用2019年6月4日到10月10日棉花生育期內(nèi)數(shù)據(jù).4—5月為棉花苗期,為促進(jìn)棉花根系生長進(jìn)行蹲苗,第1次灌溉于6月開始.以棉花蒸散量采集時(shí)間為基準(zhǔn),使用Python對(duì)氣象數(shù)據(jù)進(jìn)行重采樣處理,統(tǒng)一數(shù)據(jù)間隔為1 h,并對(duì)數(shù)據(jù)質(zhì)量進(jìn)行控制,剔除異常數(shù)據(jù),相鄰樣本缺失值小于3條,采用線性插值法進(jìn)行填補(bǔ),有效數(shù)據(jù)共3 095條.為了避免參數(shù)量綱和單位的差別影響模型的訓(xùn)練效果,采用式(1)將數(shù)據(jù)集中每一維特征值縮放至0~1,并按8∶2隨機(jī)劃分訓(xùn)練集和測(cè)試集.
x*i=xi-xminxmax-xmin,(1)
式中:x* i,xi,xmax,xmin分別為數(shù)據(jù)的歸一化值、實(shí)際值、最大值和最小值;下標(biāo)i為第i個(gè)特征值.
2 棉花蒸散量預(yù)測(cè)模型
2.1 最大互信息系數(shù)
MIC最早由RESHEF等[17]于2011年提出,MIC根據(jù)2個(gè)隨機(jī)變量的聯(lián)合概率密度衡量二者之間的關(guān)系.相較于常用的Pearson,Spearman等相關(guān)性分析方法,MIC的最大優(yōu)點(diǎn)是不僅可以衡量變量之間的線性關(guān)系,還可以描述非線性關(guān)系.
在MIC中,互信息MI(x,y)是MIC的基礎(chǔ),公式為
MI(x,y)=∫p(x,y)log2p(x,y)p(x)p(y)dxdy,(2)
式中:p(x,y)為x和y的聯(lián)合概率密度函數(shù);p(x)和p(y)為邊緣密度函數(shù).
假設(shè)D是有限的二維數(shù)據(jù)集,將x和y分別劃分成m段和n段,形成m×n的網(wǎng)格G,MI*(D,x,y)為數(shù)據(jù)集D劃分為網(wǎng)格G時(shí)的最大互信息,即
MI*(D,x,y)=max MI(D|G),(3)
計(jì)算數(shù)據(jù)集D在不同離散化方式下的互信息最大值MI*(x,y),并歸一化,得到最大互信息系數(shù)MIC的計(jì)算式為
MIC(x,y)=maxmnlt;BMI*(x,y)log2 min{m,n},(4)
式中:MIC的取值為[0,1];B為網(wǎng)格劃分?jǐn)?shù)量的上限.
2.2 XGBoost算法
XGBoost算法是一種Boosting集成算法,每棵決策樹連續(xù)學(xué)習(xí)前一棵樹的預(yù)測(cè)值和真值之間的殘差,并最終累積多棵樹的學(xué)習(xí)結(jié)果作為預(yù)測(cè)結(jié)果.XGBoost引入二階導(dǎo)數(shù)對(duì)代價(jià)函數(shù)進(jìn)行泰勒展開,目標(biāo)函數(shù)由損失函數(shù)和正則化項(xiàng)2部分組成,定義如下.
Obj=∑ni=1lyi,y^i+∑Ω(ft),(5)
式中:n為樣本總數(shù);yi為樣本的實(shí)測(cè)值;y^i為模型預(yù)測(cè)值;l(yi,y^i)為單個(gè)樣本的損失;Ω(ft)為正則化項(xiàng),用于防止模型過擬合,定義為
Ω(ft)=γT+12λ∑Tj=1w2j,(6)
式中:γ和λ分別為控制樹結(jié)構(gòu)和葉節(jié)點(diǎn)權(quán)重分布的參數(shù);T為葉節(jié)點(diǎn)的數(shù)量;wj為葉節(jié)點(diǎn)得分(j=1,2,…,T).
迭代t次的目標(biāo)函數(shù)為
Obj(t)=∑ni=1l[yi,y^(t-1)i+ft(xi)]+Ω(ft),(7)
使用泰勒展開式展開上述方程,即
Obj(t)≈∑ni=1l(yi,y^(t-1)i)+gift(xi)+12hif2t(xi)+
Ω(ft),(8)
式中:gi為l(yi,y^(t-1)i)對(duì)y^(t-1)i的一階導(dǎo)數(shù);hi為l(yi,y^(t-1)i)對(duì)y^(t-1)i的二階導(dǎo)數(shù).Gj,Hj分別為一階導(dǎo)之和和二階導(dǎo)之和,求解得到葉節(jié)點(diǎn)j的權(quán)重w*j,并帶入目標(biāo)函數(shù)如式(9)和(10)所示.
wj*=-GjHj+λ,(9)
Obj(t)=-12∑Tj=1G2jHj+λ+γT.(10)
2.3 WOA優(yōu)化的XGBoost
鯨魚優(yōu)化算法(WOA)是一種基于座頭鯨捕食策略的群體智能優(yōu)化算法.鯨魚種群以隨機(jī)游動(dòng)方式搜索全局,個(gè)體以當(dāng)前最優(yōu)解為目標(biāo)進(jìn)行包圍,并通過收縮環(huán)繞和螺旋形運(yùn)動(dòng)進(jìn)行捕獵,群體經(jīng)過迭代更新趨近于目標(biāo)獵物,最終得到全局最優(yōu)解.
將WOA和XGBoost相結(jié)合,形成WOA-XGBoost混合模型如圖1所示.通過WOA優(yōu)化XGBoost的超參數(shù),從而提高模型模擬棉花蒸散量的效率.在XGBoost模型中選擇了7個(gè)重要參數(shù)進(jìn)行優(yōu)化,此7項(xiàng)超參數(shù)對(duì)模型的擬合性能影響較大,可取范圍較廣,難以根據(jù)經(jīng)驗(yàn)選取最優(yōu)值,表1為參數(shù)調(diào)整信息.WOA-XGBoost模型主要步驟如下所示:
1) 在WOA中初始化種群和迭代次數(shù),設(shè)定XGBoost每個(gè)參數(shù)尋優(yōu)范圍.
2) 將訓(xùn)練集數(shù)據(jù)代入XGBoost模型,定義適應(yīng)度函數(shù)為訓(xùn)練集5折交叉驗(yàn)證的均方根誤差平均值,并根據(jù)適應(yīng)度值初始化鯨魚全局最優(yōu)值和個(gè)體最優(yōu)值.
3) 將每次迭代的尋優(yōu)參數(shù)代入目標(biāo)函數(shù)計(jì)算個(gè)體適應(yīng)度值,根據(jù)當(dāng)前最優(yōu)位置更新群體鯨魚的位置.
4) 迭代終止時(shí),輸出最優(yōu)鯨魚位置,即XGBoost模型的最優(yōu)參數(shù).
5) 將最優(yōu)參數(shù)輸入XGBoost模型模擬,評(píng)估模型性能.
2.4 模型評(píng)價(jià)指標(biāo)
采用平均絕對(duì)誤差(MAE)、均方根誤差(RMSE)、平均絕對(duì)百分比誤差(MAPE)和決定系數(shù)(R2)評(píng)估WOA-XGBoost模型的預(yù)測(cè)性能.
3 結(jié)果與分析
3.1 棉花蒸散量因素相關(guān)性分析
模型輸入特征關(guān)系著非線性模型預(yù)測(cè)的準(zhǔn)確性和復(fù)雜程度.為了探索影響膜下滴灌棉花蒸散量的最佳驅(qū)動(dòng)因素,更有效地保證特征子集的充分降維,采用MIC衡量膜下滴灌棉花蒸散量與多源環(huán)境因素間的相關(guān)程度.結(jié)果如表2所示,各因素與蒸散量的相關(guān)程度差異較大,MIC值為0.092~0.722,Rs與蒸散量之間的相關(guān)性最高,SWC和蒸散量相關(guān)性最低.根據(jù)相關(guān)系數(shù)大小進(jìn)行排序,選取Rs,Tmin,Tmax,RH和其他因子的幾種組合作為WOA-XGBoost模型的輸入,如表3所示.
3.2 蒸散量預(yù)測(cè)結(jié)果分析
圖2為膜下滴灌棉花生育期內(nèi)5種不同輸入組合下蒸散量的模擬結(jié)果.隨著輸入?yún)?shù)維度的增加,預(yù)測(cè)模型的R2逐漸增大,MAE,RMSE和MAPE逐漸減小,模型變化坡度在組合C1到組合C3階段變化明顯,在組合C3到組合C5階段趨于平緩.在組合C1中,前4個(gè)最重要的氣象因素被用作模型輸入,模型獲得了較高的預(yù)測(cè)精度(R2=0.864,MAE=0.052 mm/h,RMSE=0.085 mm/h,MAPE=0.284),輸入組合C2時(shí),隨著氣象因素的增加,模型預(yù)測(cè)精度逐步提升,表明蒸散量與多種氣象信息高度相關(guān),且影響程度與上述相關(guān)度分析吻合.組合C3中加入了土壤溫度,R2,MAE,RMSE和MAPE分別提升了5.30%,23.75%,20.69%,19.53%,模型的準(zhǔn)確性顯著提升,說明蒸散發(fā)的強(qiáng)度涉及多種環(huán)境因素,而土壤因素是影響膜下滴灌棉花生育期蒸散發(fā)的又一關(guān)鍵因素.此外,在組合C3至組合C5階段,輸入因素從6個(gè)增加到8個(gè),但是模型R2僅提高了0.30%.綜上,在保證模型精度的前提下,組合C3是保證預(yù)測(cè)模型實(shí)現(xiàn)低復(fù)雜度、高效率的最優(yōu)輸入組合.同時(shí),得到XGBoost模型的最優(yōu)超參數(shù):n_estimators=410,learning_rate=0.015,max_depth=18,min_child_weight=5,gamma=0,subsample=0.15,reg_alpha=0.022.
3.3 模型預(yù)測(cè)對(duì)比分析
為了驗(yàn)證WOA-XGBoost模型的效果,結(jié)合當(dāng)前國內(nèi)外作物蒸散量評(píng)估領(lǐng)域中的常用算法,選取SVM,XGBoost,PSO-XGBoost,WOA-SVM與文中構(gòu)建的模型進(jìn)行對(duì)比分析.在同等試驗(yàn)條件下,設(shè)定相同的數(shù)據(jù)集和迭代次數(shù),單一SVM模型和單一XG-
Boost采用網(wǎng)格搜索進(jìn)行調(diào)參,將Rs,Tmin,Tmax,RH,u2和ST作為輸入特征.
表4列出了5個(gè)機(jī)器學(xué)習(xí)模型在測(cè)試集中預(yù)測(cè)膜下滴灌棉花蒸散量的評(píng)價(jià)指標(biāo).
從表4中可以看出,WOA-XGBoost模型具有最好的預(yù)測(cè)精度,相比于XGBoost模型、SVM模型、PSO-XGBoost模型和WOA-SVM模型,WOA-XGBoost模型的MAE降低了2.56%~20.83%,RMSE降低了3.03%~11.11%,MAPE降低了2.21%~43.19%.在單一預(yù)測(cè)模型中,XGBoost模型的性能明顯優(yōu)于SVM模型.將單一模型與優(yōu)化算法相結(jié)合后,模型的精度提升顯著,XGBoost模型的MAE,RMSE和MAPE分別降低了4.88%~7.32%,2.94%~5.88%,4.24%~6.36%.SVM模型的MAE,RMSE和MAPE分別降低了2.04%,1.39%,6.17%.說明混合模型相較于單一模型在處理復(fù)雜的非線性數(shù)據(jù)方面性能更佳,且XGBoost模型、PSO-XGBoost模型和WOA-XGBoost模型相比SVM模型和WOA-SVM模型預(yù)測(cè)性能穩(wěn)定,有更好的魯棒性.此外,在3種XGBoost模型中,相較于PSO算法進(jìn)行超參數(shù)調(diào)優(yōu),WOA優(yōu)化的預(yù)測(cè)模型MAE,RMSE和MAPE分別降低了2.56%,3.03%和2.21%.
圖3為3種XGBoost模型蒸散量預(yù)測(cè)值ETe和實(shí)測(cè)值ETm的分布差異.
由圖3可知,WOA-XGBoost模型的散點(diǎn)均勻分布在趨勢(shì)線的兩側(cè),更接近1∶1擬合線.擬合效果最好,其次是PSO-XGBoost模型和XGBoost模型,預(yù)測(cè)結(jié)果與實(shí)測(cè)值的誤差基本保持在較小范圍內(nèi).綜上,WOA-XGBoost模型具有更好的擬合效果和泛化能力,驗(yàn)證了其在膜下滴灌棉花蒸散量預(yù)測(cè)中的相對(duì)優(yōu)越性.
4 討 論
目前對(duì)作物蒸散量的預(yù)測(cè)大多局限于氣象數(shù)據(jù),且以FAO-Penman-Monteith計(jì)算值為預(yù)測(cè)依據(jù),與實(shí)際蒸散量尚有部分差距,文中以蒸滲儀數(shù)據(jù)構(gòu)建較為準(zhǔn)確的預(yù)測(cè)模型,可以更全面地理解氣象因素和土壤因素對(duì)棉花水分需求的貢獻(xiàn)和響應(yīng).
通過MIC量化結(jié)果得出,太陽輻射是影響棉花蒸散量的關(guān)鍵性因素,而土壤含水量和蒸散量相關(guān)性較小,這與何淑林等[18]對(duì)蘋果園蒸散量的成因分析結(jié)果存在差異,這是因?yàn)槊藁は碌喂鄷r(shí),覆膜極大地減少了水分蒸發(fā),使土壤含水量對(duì)棉花蒸散發(fā)的影響減弱.另一方面,棉花生理數(shù)據(jù)可能對(duì)棉花蒸散發(fā)也存在不同程度的作用,還需研究氣象因素、土壤因素和生理因素之間的綜合影響.
在預(yù)測(cè)模型中,輸入因素的數(shù)量是構(gòu)建預(yù)測(cè)模型的重點(diǎn)之一.從圖2可以看出,8個(gè)輸入因素與6個(gè)輸入因素的預(yù)測(cè)精度差異并不大,在XGBoost和SVM這2種預(yù)測(cè)模型中,XGBoost預(yù)測(cè)性能更好,這是由于XGBoost算法權(quán)衡了多棵決策樹的預(yù)測(cè)結(jié)果,并通過迭代不斷地?cái)M合殘差,能夠更深入地挖掘特征信息.優(yōu)化后的混合算法預(yù)測(cè)效果提升明顯,這與吳立峰等[19]研究的結(jié)論一致.使用WOA優(yōu)化的XGBoost模型性能最優(yōu),說明WOA對(duì)模型超參數(shù)尋優(yōu)過程中的全局搜索能力更強(qiáng),能夠跳出局部最優(yōu)解,提高模型預(yù)測(cè)準(zhǔn)確性.
5 結(jié) 論
1) 通過對(duì)影響膜下滴灌棉花蒸散量的氣象數(shù)據(jù)和土壤數(shù)據(jù)進(jìn)行MIC相關(guān)性分析.結(jié)果表明,膜下滴灌棉花蒸散量主要受Rs制約,其次是Tmin,Tmax,RH和u2,在土壤數(shù)據(jù)中,ST是影響蒸散量又一關(guān)鍵因素.
2) 根據(jù)蒸散量影響因素的重要性順序構(gòu)建輸入組合,Rs,Tmin,Tmax,RH,u2和ST在模擬結(jié)果中具有較優(yōu)的綜合性能,被用作預(yù)測(cè)模型的輸入?yún)?shù),有效減少了建模過程中的時(shí)間成本,提高了訓(xùn)練效率.
3) 將文中提出的WOA-XGBoost和4種模型進(jìn)行了預(yù)測(cè)效果對(duì)比,發(fā)現(xiàn)結(jié)合群體智能優(yōu)化算法的混合模型比單一模型性能更優(yōu),在混合模型中,WOA-XGBoost有更好的超參數(shù)尋優(yōu)能力,模型R2,MAE,RMSE和MAPE分別達(dá)到了0.922,0.038 mm/h,0.064 mm/h和0.221,預(yù)測(cè)精度最優(yōu),充分證明了WOA-XGBoost模型在膜下滴灌棉花蒸散量預(yù)測(cè)中的適用性.
參考文獻(xiàn)(References)
[1] 王洪博,李國輝,徐雪雯,等.基于AquaCrop模型評(píng)估氣候變化下棉花生產(chǎn)的可持續(xù)性[J].中國農(nóng)業(yè)氣象,2023,44(7):588-598.
WANG Hongbo,LI Guohui,XU Xuewen,et al.Assessing the sustainability of cotton production under climate change based on the AquaCrop model[J]. Chinese journal of agrometeorology,2023,44(7):588-598.(in Chinese)
[2] 丁宇,張江輝,白云崗,等.雙膜條件下不同干播濕出水分處理對(duì)棉花生理、生長特性的影響[J].新疆農(nóng)業(yè)科學(xué),2023,60(4):810-822.
DING Yu,ZHANG Jianghui,BAI Yungang,et al.Effects of different dry sowing and wet-out water treatments on cotton physiology,growth characteristics and yield under double film conditions[J]. Xinjiang agricultural sciences,2023,60(4):810-822.(in Chinese)
[3] 鄭明,白云崗,張江輝,等.干播濕出灌水量和灌水頻率對(duì)棉田土壤板結(jié)、水鹽分布及出苗的影響[J].干旱地區(qū)農(nóng)業(yè)研究,2022,40(6):100-107.
ZHENG Ming,BAI Yungang,ZHANG Jianghui,et al.Effects of irrigation quantity and irrigation frequency of dry sowing and wet seedling on soil compaction,water-salt distribution and seedling emergence in cotton fields[J].Agricultural research in the arid areas,2022,40(6):100-107.(in Chinese)
[4] FANG R M,SONG S J.Daily reference evapotrans-piration prediction of Tieguanyin tea plants based on mathematical morphology clustering and improved gene-ralized regression neural network[J].Agricultural water management,2020,236:106177.
[5] 劉艷萍,杜雅麗,聶銘君,等.基于稱重式蒸滲儀及多種傳感器的作物表型及蒸散監(jiān)測(cè)系統(tǒng)研制[J].農(nóng)業(yè)工程學(xué)報(bào),2019,35(1):114-122.
LIU Yanping,DU Yali,NIE Mingjun,et al.Design of crop phenotype and evapotranspiration monitoring system based on weighing lysimeter and multi-sensors[J].Transactions of the CSAE,2019,35(1):114-122.(in Chinese)
[6] CHIA M Y,HUANG Y F,KOO C H,et al.Swarm-based optimization as stochastic training strategy for estimation of reference evapotranspiration using extreme learning machine[J].Agricultural water management,2021,243:106447.
[7] TANG D H,F(xiàn)ENG Y,GONG D Z,et al.Evaluation of artificial intelligence models for actual crop evapotranspiration modeling in mulched and non-mulched maize croplands[J].Computers and electronics in agriculture,2018,152:375-384.
[8] ZHAO L,ZHAO X B,ZHOU H M,et al.Prediction model for daily reference crop evapotranspiration based on hybrid algorithm and principal components analysis in southwest China[J].Computers and electronics in agriculture,2021,190(1/2):106424.
[9] 張皓杰,崔寧博,徐穎,等.基于ELM的西北旱區(qū)參考作物蒸散量預(yù)報(bào)模型[J].排灌機(jī)械工程學(xué)報(bào),2018,36(8):779-784.
ZHANG Haojie,CUI Ningbo,XU Ying,et al.Prediction for reference crop evapotranspiration in arid northwest China based on ELM[J].Journal of drainage and irrigation machinery engineering,2018,36(8):779-784.(in Chinese)
[10] FAN J L,YUE W J,WU L F,et al.Evaluation of SVM,ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China[J].Agricultural and forest meteorology,2018,263:225-241.
[11] CHEN T Q,GUESTRIN C. XGBoost:a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York,United States:Association for Computing Machinery,2016:785-794.
[12] SAGGI M K,JAIN S. Application of fuzzy-genetic and regularization random forest (FG-RRF):Estimation of crop evapotranspiration (ETc) for maize and wheat crops[J].Agricultural water management,2020,229:105907.
[13] LIU Q S,WU Z J,CUI N B,et al.Genetic algorithm-optimized extreme learning machine model for estimating daily reference evapotranspiration in southwest China[J].Atmosphere,2022,13(6):971.
[14] MIRJALILI S,LEWIS A. The whale optimization algorithm[J].Advances in engineering software,2016,95:51-67.
[15] 袁莉芬,李松,尹柏強(qiáng),等.基于自適應(yīng)快速S變換和XGBoost的心電信號(hào)精確快速分類方法[J].電子與信息學(xué)報(bào),2023,45(4):1464-1474.
YUAN Lifen,LI Song,YIN Baiqiang,et al.Accurate and fast ElectroCardioGram classification method based on adaptive fast S-Transform and XGBoost[J].Journal of electronics amp; information technology,2023,45(4):1464-1474.(in Chinese)
[16] 杜豫川,都州揚(yáng),劉成龍.基于極限梯度提升的公路深層病害雷達(dá)識(shí)別[J].同濟(jì)大學(xué)學(xué)報(bào)(自然科學(xué)版),2020,48(12):1742-1750.
DU Yuchuan,DU Zhouyang,LIU Chenglong.Road diseases recognition of ground penetrating radar based on extreme gradient boosting[J].Journal of Tongji University(natural science),2020,48(12):1742-1750.(in Chinese)
[17] RESHEF D N,RESHEF Y A,F(xiàn)INUCANE H K,et al.Detecting novel associations in large data sets[J]. Science,2011,334(6062):1518-1524.
[18] 何淑林,劉慧敏,金立強(qiáng),等.基于神經(jīng)網(wǎng)絡(luò)算法的果樹需水預(yù)測(cè)研究[J].灌溉排水學(xué)報(bào),2022,41(1):19-24.
HE Shulin,LIU Huimin,JIN Liqiang,et al.Calculating demands of fruit trees for water using neural network algorithm[J].Journal of irrigation and drainage,2022,41(1):19-24.(in Chinese)
[19] 吳立峰,魯向暉,劉小強(qiáng),等.蝙蝠算法優(yōu)化極限學(xué)習(xí)機(jī)模擬參考作物蒸散量[J].排灌機(jī)械工程學(xué)報(bào),2018,36(9):802-805.
WU Lifeng,LU Xianghui,LIU Xiaoqiang,et al.Simula-tion of reference crop evapotranspiration by using bat algorithm optimization based extreme learning machine[J].Journal of drainage and irrigation machinery engineering,2018,36(9):802-805.(in Chinese)
(責(zé)任編輯 黃鑫鑫)