莫曉云+周杰明+金芳
摘 要 歷史相依決策模型(HDDM)及歷史相依決策過程(HDDP)是決策模型及相應(yīng)的決策過程的一般情形. 馬氏決策模型(MDM)及馬氏決策過程(MDP)是HDDM及HDDP的特殊情形.本文嚴(yán)格地建立了歷史相依決策模型,并證明了相應(yīng)的歷史相依決策過程的存在性,證明是構(gòu)造性的. 作為HDDM及HDDP的特殊情形,建立了馬氏決策模型(MDM), 并構(gòu)造了相應(yīng)的馬氏決策過程(MDP).
關(guān)鍵詞 歷史相依決策模型的建立; 歷史相依決策過程的存在性和構(gòu)造; 馬氏決策模型及馬氏決策過程; 馬氏過程
中圖分類號 O212.5 文獻(xiàn)標(biāo)識碼 A 文章編號 1000-2537(2017)05-0088-07
Establishment of History Dependent Decision Models and Construction of Corresponding Processes
MO Xiao-yun1,2, ZHOU Jie-ming2, JIN Fang3*
(1. College of Mathematics and Statistics, Hunan University of Finance and Economics, Changsha 410205, China;
2. College of Mathematics and Computer Science, Key Laboratory of High Performance Computing and Stochastic
Information Processing, Ministry of Education of China, Hunan Normal University, Changsha, 410081, China;
3.College of Mathematics and Computing Science, Hunan City University, Yiyang, 413000, China)
Abstract History Dependent Decision Model (HDDM) and History Dependent Decision Process (HDDP) are the most general cases of the decision model and their corresponding processes. The Markov Decision Model (MDM) and Markov Decision Process (MDP) are special cases of HDDM and HDDP. In this work, the history dependent decision model has been established, and the existence of corresponding history dependent decision process has been proved. The proof is constructive. As special cases of HDDM and HDDP, the Markov decision model has been established and the Markov decision process has been constructed.
Key words history dependent decision model; Markov decision model; Markov decision process; Markov process
在描述馬氏決策模型(MDM)及相應(yīng)的馬氏決策過程(MDP)的決策控制系統(tǒng)中,系統(tǒng)將來的狀態(tài)只依賴于系統(tǒng)現(xiàn)在的狀態(tài)和現(xiàn)在采取的決策行動.如果系統(tǒng)將來的狀態(tài)依賴于系統(tǒng)的歷史狀態(tài)和歷史決策行動,這就是歷史相依決策模型(HDDM)及相應(yīng)的歷史相依決策過程(HDDP).由于HDDM和HDDP過于一般,較難深入研究.但對馬氏決策模型及相應(yīng)過程,已經(jīng)有深刻的研究,有豐富的成果[ 1-5 ].關(guān)于馬氏決策模型及相應(yīng)過程的諸多專著和論文中,總是簡單地提及歷史相依決策模型及相應(yīng)過程,然而卻沒有詳細(xì)和準(zhǔn)確地給出歷史相依決策模型的建立以及相應(yīng)過程的構(gòu)造. 因此,完成這個建立和構(gòu)造很有必要.我們對于諸多相類似的模型及其過程的構(gòu)造,已經(jīng)有很好的研究[6-10],本文將利用文獻(xiàn)[6-11]中的思想和方法.
1 歷史相依決策模型
設(shè)有某個受決策者控制的系統(tǒng),該系統(tǒng)的狀態(tài)依賴于時間、系統(tǒng)的歷史狀態(tài)和決策者的歷史決策行動. 時間可以是連續(xù)的,但離散時間更接近于實際的操作. 假定時間為n=0,1,2,…,N. N是正整數(shù),也稱期末時. 設(shè)在某個時刻,系統(tǒng)處于某個狀態(tài)x,在該時刻決策者可以作出某個決策行動a,下一時刻,系統(tǒng)的狀態(tài)將從x轉(zhuǎn)移到某個狀態(tài)y. 如果在每個時刻n∈{0,1,2,…,N-1},決策者都做出一個決策行動,這N個行動全體就構(gòu)成一個決策策略. 策略和行動不同. 研究決策模型的目標(biāo)之一是選擇最好的策略,使得系統(tǒng)的某個指標(biāo)達(dá)到最優(yōu).例如,考慮某個投資者,他是決策者,系統(tǒng)的狀態(tài)就是他的財富,如果他希望期末時財富最多,如何投資就是他的策略.
定理6說明,對于歷史相依決策過程,如果僅僅只研究其值函數(shù),則只要研究馬氏決策過程.
致謝 感謝“風(fēng)險理論與隨機(jī)控制”討論班的老師們提出的研究問題和寶貴建議.
參考文獻(xiàn):
[1] BAUERLE N, RIEDER U. Markov decision processes with applications to finance [M]. Berlin: Springer-Verlag, 2011.endprint
[2] GUO X P, HEMANDEZ-LEMA O. Continuous-time Markov decision processes [M]. Berlin: Springer-Verlag, 2009.
[3] GUO X P, HEMANDEZ-LEMA O, PRIETO-RUMEAU T. A survey of recent results on continuous-time Markov decision processes [J]. Top, 2006,14(2):177-246.
[4] HINDERER K. Foundations of non-stationary dynamic programming with discrete time parameter [M]. Berlin: Springer-Verlag, 1970.
[5] 嚴(yán)加安. 測度論講義(第二版)[M]. 北京:科學(xué)出版社,2004.
[6] 莫曉云. 用獨立乘積空間構(gòu)造相依隨機(jī)變量的組裝法 [J]. 湖南師范大學(xué)自然科學(xué)學(xué)報, 2010,33(2):3-6.
[7] 莫曉云,歐 輝,周杰明. Markov相依風(fēng)險模型的等價定理及概率構(gòu)造 [J]. 經(jīng)濟(jì)數(shù)學(xué), 2012,29(1):61-64.
[8] MO X Y,YANG X Q. Criterion of semi-Markov dependent risk model [J]. Acta Math Sin, 2014,30B(7):1237-1280.
[9] MO X Y,ZHOU J M, OU H, et al. Double Markov risk model [J]. Acta Math Sci, 2013,33B(2):330-340.
[10] 莫曉云,楊向群. Markov調(diào)制風(fēng)險模型的軌道刻劃和概率構(gòu)造[J]. 應(yīng)用數(shù)學(xué)學(xué)報, 2012,35(3):385-394.
[11] ZHOU J M, MO X Y, OU H, et al. Expected present value of total dividends in the compound binomial model with delayed claims and random income[J]. Acta Math Sci, 2013,33B(6):1639-1651.endprint