Strong average optimality criterion for continuous-time Markov decision processes
Kybernetika, Tome 50 (2014) no. 6, pp. 950-977
Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

This paper deals with continuous-time Markov decision processes with the unbounded transition rates under the strong average cost criterion. The state and action spaces are Borel spaces, and the costs are allowed to be unbounded from above and from below. Under mild conditions, we first prove that the finite-horizon optimal value function is a solution to the optimality equation for the case of uncountable state spaces and unbounded transition rates, and that there exists an optimal deterministic Markov policy. Then, using the two average optimality inequalities, we show that the set of all strong average optimal policies coincides with the set of all average optimal policies, and thus obtain the existence of strong average optimal policies. Furthermore, employing the technique of the skeleton chains of controlled continuous-time Markov chains and Chapman-Kolmogorov equation, we give a new set of sufficient conditions imposed on the primitive data of the model for the verification of the uniform exponential ergodicity of continuous-time Markov chains governed by stationary policies. Finally, we illustrate our main results with an example.
This paper deals with continuous-time Markov decision processes with the unbounded transition rates under the strong average cost criterion. The state and action spaces are Borel spaces, and the costs are allowed to be unbounded from above and from below. Under mild conditions, we first prove that the finite-horizon optimal value function is a solution to the optimality equation for the case of uncountable state spaces and unbounded transition rates, and that there exists an optimal deterministic Markov policy. Then, using the two average optimality inequalities, we show that the set of all strong average optimal policies coincides with the set of all average optimal policies, and thus obtain the existence of strong average optimal policies. Furthermore, employing the technique of the skeleton chains of controlled continuous-time Markov chains and Chapman-Kolmogorov equation, we give a new set of sufficient conditions imposed on the primitive data of the model for the verification of the uniform exponential ergodicity of continuous-time Markov chains governed by stationary policies. Finally, we illustrate our main results with an example.
DOI : 10.14736/kyb-2014-6-0950
Classification : 49K45, 90C40, 93E20
Keywords: continuous-time Markov decision processes; strong average optimality criterion; finite-horizon expected total cost criterion; unbounded transition rates; optimal policy; optimal value function
@article{10_14736_kyb_2014_6_0950,
     author = {Wei, Qingda and Chen, Xian},
     title = {Strong average optimality criterion for continuous-time {Markov} decision processes},
     journal = {Kybernetika},
     pages = {950--977},
     year = {2014},
     volume = {50},
     number = {6},
     doi = {10.14736/kyb-2014-6-0950},
     mrnumber = {3301781},
     zbl = {1307.93467},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.14736/kyb-2014-6-0950/}
}
TY  - JOUR
AU  - Wei, Qingda
AU  - Chen, Xian
TI  - Strong average optimality criterion for continuous-time Markov decision processes
JO  - Kybernetika
PY  - 2014
SP  - 950
EP  - 977
VL  - 50
IS  - 6
UR  - http://geodesic.mathdoc.fr/articles/10.14736/kyb-2014-6-0950/
DO  - 10.14736/kyb-2014-6-0950
LA  - en
ID  - 10_14736_kyb_2014_6_0950
ER  - 
%0 Journal Article
%A Wei, Qingda
%A Chen, Xian
%T Strong average optimality criterion for continuous-time Markov decision processes
%J Kybernetika
%D 2014
%P 950-977
%V 50
%N 6
%U http://geodesic.mathdoc.fr/articles/10.14736/kyb-2014-6-0950/
%R 10.14736/kyb-2014-6-0950
%G en
%F 10_14736_kyb_2014_6_0950
Wei, Qingda; Chen, Xian. Strong average optimality criterion for continuous-time Markov decision processes. Kybernetika, Tome 50 (2014) no. 6, pp. 950-977. doi: 10.14736/kyb-2014-6-0950

[1] Bäuerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer, Berlin 2011. | MR | Zbl

[2] Bertsekas, D. P., Shreve, S. E.: Stochastic Optimal Control: The Discrete-time Case. Academic Press, New York 1978. | MR | Zbl

[3] Cavazos-Cadena, R., Fernández-Gaucherand, E.: Denumerable controlled Markov chains with strong average optimality criterion: bounded and unbounded costs. Math. Methods Oper. Res. 43 (1996), 281-300. | DOI | MR | Zbl

[4] Dijk, N. M. van: On the finite horizon Bellman equation for controlled Markov jump models with unbounded characteristics: existence and approximation. Stochastic Process. Appl. 28 (1988), 141-157. | MR

[5] Dynkin, E. B., Yushkevich, A. A.: Controlled Markov Processes. Springer, New York 1979. | MR

[6] Feller, W.: On the integro-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48 (1940), 488-515. | DOI | MR | Zbl

[7] Flynn, J.: On optimality criteria for dynamic programs with long finite horizons. J. Math. Anal. Appl. 76 (1980), 202-208. | DOI | MR | Zbl

[8] Ghosh, M. K., Marcus, S. I.: On strong average optimality of Markov decision processes with unbounded costs. Oper. Res. Lett. 11 (1992), 99-104. | DOI | MR | Zbl

[9] Ghosh, M. K., Saha, S.: Continuous-time controlled jump Markov processes on the finite horizon. In: Optimization, Control, and Applications of Stochastic Systems (D. Hernández-Hernández and J. A. Minjárez-Sosa, eds.), Springer, New York 2012, pp. 99-109. | MR

[10] Gihman, I. I., Skohorod, A. V.: Controlled Stochastic Processes. Springer, Berlin 1979. | MR

[11] Guo, X. P., Rieder, U.: Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Probab. 16 (2006), 730-756. | DOI | MR | Zbl

[12] Guo, X. P.: Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math. Oper. Res. 32 (2007), 73-87. | DOI | MR | Zbl

[13] Guo, X. P., Hernández-Lerma, O.: Continuous-Time Markov Decision Processes: Theory and Applications. Springer, Berlin 2009. | Zbl

[14] Guo, X.P., Ye, L. E.: New discount and average opti mality conditions for continuous-time Markov decision processes. Adv. in Appl. Probab. 42 (2010), 953-985. | DOI | MR

[15] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York 1996. | MR | Zbl

[16] Hernández-Lerma, O., Lasserre, J. B.: Further Topics on Discrete-Time Markov Control Processes. Springer, New York 1999. | MR | Zbl

[17] Meyn, S. P., Tweedie, R. L.: Computable bounds for geometric convergence rates of Markov chains. Ann. Appl. Probab. 4 (1994), 981-1011. | DOI | MR | Zbl

[18] Miller, B. L.: Finite state continuous time Markov decision processes with finite planning horizon. SIAM J. Control 6 (1968), 266-280. | DOI | MR

[19] Pliska, S. R.: Controlled jump processes. Stochastic Process. Appl. 3 (1975), 259-282. | MR | Zbl

[20] Puterman, M. L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York 1994. | MR | Zbl

[21] Ye, L. E., Guo, X. P.: New sufficient conditions for average optimality in continuous-time Markov decision processes. Math. Methods Oper. Res. 72 (2010), 75-94. | DOI | MR | Zbl

[22] Yushkevich, A. A.: Controlled jump Markov models. Theory Probab. Appl. 25 (1980), 244-266. | DOI | Zbl

[23] Zhu, Q. X.: Average optimality inequality for continuous-time Markov decision processes in Polish spaces. Math. Methods Oper. Res. 66 (2007), 299-313. | DOI | MR | Zbl

[24] Zhu, Q.X.: Average optimality for continuous-time Markov decision processes with a policy iteration approach. J. Math. Anal. Appl. 339 (2008), 691-704. | DOI | MR | Zbl

Cité par Sources :