Risk probability optimization problem for finite horizon continuous time Markov decision processes with loss rate
Kybernetika, Tome 57 (2021) no. 2, pp. 272-294
Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

This paper presents a study the risk probability optimality for finite horizon continuous-time Markov decision process with loss rate and unbounded transition rates. Under drift condition, which is slightly weaker than the regular condition, as detailed in existing literature on the risk probability optimality Semi-Markov decision processes, we prove that the value function is the unique solution of the corresponding optimality equation, and demonstrate the existence of a risk probability optimization policy using an iteration technique. Furthermore, we provide verification of the imposed condition with two examples of controlled birth-and-death system and risk control, and further demonstrate that a value iteration algorithm can be used to calculate the value function and develop an optimal policy.
This paper presents a study the risk probability optimality for finite horizon continuous-time Markov decision process with loss rate and unbounded transition rates. Under drift condition, which is slightly weaker than the regular condition, as detailed in existing literature on the risk probability optimality Semi-Markov decision processes, we prove that the value function is the unique solution of the corresponding optimality equation, and demonstrate the existence of a risk probability optimization policy using an iteration technique. Furthermore, we provide verification of the imposed condition with two examples of controlled birth-and-death system and risk control, and further demonstrate that a value iteration algorithm can be used to calculate the value function and develop an optimal policy.
DOI : 10.14736/kyb-2021-2-0272
Classification : 60E20, 90C40
Keywords: continuous-time Markov decision processes; loss rate; risk probability criterion; finite horizon; optimal policy; unbounded transition rate
@article{10_14736_kyb_2021_2_0272,
     author = {Huo, Haifeng and Wen, Xian},
     title = {Risk probability optimization problem for finite horizon continuous time {Markov} decision processes with loss rate},
     journal = {Kybernetika},
     pages = {272--294},
     year = {2021},
     volume = {57},
     number = {2},
     doi = {10.14736/kyb-2021-2-0272},
     mrnumber = {4273576},
     zbl = {07396267},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.14736/kyb-2021-2-0272/}
}
TY  - JOUR
AU  - Huo, Haifeng
AU  - Wen, Xian
TI  - Risk probability optimization problem for finite horizon continuous time Markov decision processes with loss rate
JO  - Kybernetika
PY  - 2021
SP  - 272
EP  - 294
VL  - 57
IS  - 2
UR  - http://geodesic.mathdoc.fr/articles/10.14736/kyb-2021-2-0272/
DO  - 10.14736/kyb-2021-2-0272
LA  - en
ID  - 10_14736_kyb_2021_2_0272
ER  - 
%0 Journal Article
%A Huo, Haifeng
%A Wen, Xian
%T Risk probability optimization problem for finite horizon continuous time Markov decision processes with loss rate
%J Kybernetika
%D 2021
%P 272-294
%V 57
%N 2
%U http://geodesic.mathdoc.fr/articles/10.14736/kyb-2021-2-0272/
%R 10.14736/kyb-2021-2-0272
%G en
%F 10_14736_kyb_2021_2_0272
Huo, Haifeng; Wen, Xian. Risk probability optimization problem for finite horizon continuous time Markov decision processes with loss rate. Kybernetika, Tome 57 (2021) no. 2, pp. 272-294. doi: 10.14736/kyb-2021-2-0272

[1] Boda, K., Filar, J. A., Lin, Y. L.: Stochastic target hitting time and the problem of early retirement. IEEE Trans. Automat. Control 49 (2004), 409-419. | DOI | MR

[2] Bouakiz, M., Kebir, Y.: Target-level criterion in Markov decision process. J. Optim. Theory Appl. 86 (1995), 1-15. | DOI | MR

[3] Bertsekas, D., Shreve, S.: Stochastic Optimal Control: The Discrete-Time Case. Academic Press Inc, New York 1978 | MR

[4] Bauerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer, Heidelberg 2011 | MR

[5] Feinberg, E.: Continuous time discounted jump Markov decision processes: a discrete-event approach. Math. Operat. Res. 29 (2004), 492-524. | DOI | MR

[6] Guo, X. P., Hernández-Lerma, O.: Continuous-Time Markov Decision Process: Theorey and Applications. Springer-Verlag, Berlin 2009. | MR

[7] Guo, X. P., Piunovskiy, A.: Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Oper. Res. 36 (2011), 105-132. | DOI | MR

[8] Guo, X. P., Huang, X. X., Huang, Y. H.: Finite-horizon optimality for continuous-time Markov decision processs with unbounded transition rates. Adv. Appl. Prob. 47 (2015), 1064-1087. | DOI | MR

[9] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Process: Basic Optimality Criteria. Springer-Verlag, New York 1996. | MR

[10] Huang, Y. H., Guo, X. P.: Optimal risk probability for first passage models in Semi-Markov processes. J. Math. Anal. Appl. 359 (2009), 404-420. | DOI | MR

[11] Huang, Y. H., Guo, X. P.: First passage models for denumberable Semi-Markov processes with nonnegative discounted cost. Acta. Math. Appl. Sinica 27 (2011), 177-190. | DOI | MR

[12] Huang, Y. H., Guo, X. P., Li, Z. F.: Minimum risk probability for finite horizon semi-Markov decision process. J. Math. Anal. Appl. 402 (2013), 378-391. | DOI | MR

[13] Huang, X. X., Zou, X. L., Guo, X. P.: A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci. China Math. 58 (2015), 1923-1938. | DOI | MR

[14] Huo, H. F., Zou, X. L., Guo, X. P.: The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dynamic system: Theory Appl. 27 (2017), 675-699. | DOI | MR

[15] Huo, H. F., Wen, X.: First passage risk probability optimality for continuous time Markov decision processes. Kybernetika 55 (2019), 114-133. | DOI | MR

[16] Huo, H. F., Guo, X.P.: Risk probability minimization problems for continuous time Markov decision processes on finite horizon. IEEE trans. Automat. Control 65 (2020), 3199-3206. | DOI | MR

[17] Jacod, J.: Multivariate point processes: Predictable projection, Radon-Nicodym derivatives, representation of martingales. Z. Wahrscheinlichkeitstheorie und verwandte Gebiete 31 (1975), 235-253. | DOI | MR

[18] Janssen, J., Manca, R.: Semi-Markov Risk Models For Finance, Insurance, and Reliability. Springer-Verlag, New York 2006. | MR

[19] Liu, Q. L., Zou, X. L.: A risk minimization problem for finite horizon semi-Markov decision processes with loss rates. J. Dynamics Games 5 (2018), 143-163. | DOI | MR

[20] Piunovskiy, A., Zhang, Y.: Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optim. 49 (2011), 2032-2061. | DOI | MR

[21] Ohtsubo, Y., Toyonaga, K.: Optimal policy for minimizing risk models in Markov decision processes. J. Math. Anal. Appl. 271 (2002), 66-81. | DOI | MR

[22] Ohtsubo, Y.: Risk minimization in optimal stopping problem and applications. J. Oper. Res. Soc. Japan 46 (2003), 342-352. | DOI | MR

[23] Ohtsubo, Y., Toyonaga, K.: Equivalence classes for optimizing risk models in Markov decision processes. Math. Methods Oper. Res. 60 (2004), 239-250. | DOI | MR

[24] Puterman, M. L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York 1994. | MR | Zbl

[25] Sakaguchi, M., Ohtsubo, Y.: Optimal threshold probability and expectation in semi-Markov decision processes. Appl. Math. Comput. 216 (2010), 2947-2958. | DOI | MR

[26] Sobel, M. J.: The variance of discounted Markov decision processes. J. Appl. Probab. 19 (1982), 744-802. | DOI | MR | Zbl

[27] Wei, Q. D., Guo, X. P.: Constrained semi-Markov decision processes with ratio and time expected average criteria in Polish spaces. Optimization 64 (2015), 1593-1623. | DOI | MR

[28] White, D. J.: Minimizing a threshold probability in discounted Markov decision processes. J. Math. Anal. Appl. Optim. 173 (1993), 634-646. | DOI | MR

[29] Wu, C. B., Lin, Y. L.: Minimizing risk models in Markov decision processes with policies depending on target values. J. Math. Anal. Appl. 231 (1999), 47-67. | DOI | MR

[30] Wu, R., Fang, K.: A risk model with delay in claim settlement. Acta Math. Applic. Sinica 15 (1999), 352-360. | DOI | MR

[31] Yu, S. X., Lin, Y. L., Yan, P. F.: Optimization models for the first arrival target distribution function in discrete time. J. Math. Anal. Appl. 225 (1998), 193-223. | DOI | MR

[32] Xia, L.: Optimization of Markov decision processes under the variance criterion. Automatica 73 (2016), 269-278. | DOI | MR

Cité par Sources :