Constrained optimality problem of Markov decision processes with Borel spaces and varying discount factors
Kybernetika, Tome 57 (2021) no. 2, pp. 295-311
Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

This paper focuses on the constrained optimality of discrete-time Markov decision processes (DTMDPs) with state-dependent discount factors, Borel state and compact Borel action spaces, and possibly unbounded costs. By means of the properties of so-called occupation measures of policies and the technique of transforming the original constrained optimality problem of DTMDPs into a convex program one, we prove the existence of an optimal randomized stationary policies under reasonable conditions.
This paper focuses on the constrained optimality of discrete-time Markov decision processes (DTMDPs) with state-dependent discount factors, Borel state and compact Borel action spaces, and possibly unbounded costs. By means of the properties of so-called occupation measures of policies and the technique of transforming the original constrained optimality problem of DTMDPs into a convex program one, we prove the existence of an optimal randomized stationary policies under reasonable conditions.
DOI : 10.14736/kyb-2021-2-0295
Classification : 60J27, 90C40
Keywords: constrained optimality problem; discrete-time Markov decision processes; Borel state and action spaces; varying discount factors; unbounded costs
@article{10_14736_kyb_2021_2_0295,
     author = {Wu, Xiao and Tang, Yanqiu},
     title = {Constrained optimality problem of {Markov} decision processes with {Borel} spaces and varying discount factors},
     journal = {Kybernetika},
     pages = {295--311},
     year = {2021},
     volume = {57},
     number = {2},
     doi = {10.14736/kyb-2021-2-0295},
     mrnumber = {4273577},
     zbl = {07396268},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.14736/kyb-2021-2-0295/}
}
TY  - JOUR
AU  - Wu, Xiao
AU  - Tang, Yanqiu
TI  - Constrained optimality problem of Markov decision processes with Borel spaces and varying discount factors
JO  - Kybernetika
PY  - 2021
SP  - 295
EP  - 311
VL  - 57
IS  - 2
UR  - http://geodesic.mathdoc.fr/articles/10.14736/kyb-2021-2-0295/
DO  - 10.14736/kyb-2021-2-0295
LA  - en
ID  - 10_14736_kyb_2021_2_0295
ER  - 
%0 Journal Article
%A Wu, Xiao
%A Tang, Yanqiu
%T Constrained optimality problem of Markov decision processes with Borel spaces and varying discount factors
%J Kybernetika
%D 2021
%P 295-311
%V 57
%N 2
%U http://geodesic.mathdoc.fr/articles/10.14736/kyb-2021-2-0295/
%R 10.14736/kyb-2021-2-0295
%G en
%F 10_14736_kyb_2021_2_0295
Wu, Xiao; Tang, Yanqiu. Constrained optimality problem of Markov decision processes with Borel spaces and varying discount factors. Kybernetika, Tome 57 (2021) no. 2, pp. 295-311. doi: 10.14736/kyb-2021-2-0295

[1] Altman, E.: Denumerable constrained Markov decision processes and finite approximations. Math. Meth. Operat. Res. 19 (1994), 169-191. | DOI | MR

[2] Altman, E.: Constrained Markov decision processes. Chapman and Hall/CRC, Boca Raton 1999. | MR

[3] Alvarez-Mena, J., Hernández-Lerma, O.: Convergence of the optimal values of constrained Markov control processes. Math. Meth. Oper. Res. 55 (2002), 461-484. | DOI | MR

[4] Borkar, V.: A convex analytic approach to Markov decision processes. Probab. Theory Relat. Fields 78 (1988), 583-602. | DOI | MR

[5] González-Hernández, J., Hernández-Lerma, O.: Extreme points of sets of randomized strategies in constrained optimization and control problems. SIAM. J. Optim. 15 (2005), 1085-1104. | DOI | MR

[6] Guo, X. P., Hernández-del-Valle, A., Hernández-Lerma, O.: First passage problems for nonstationary discrete-time stochastic control systems. Europ. J. Control 18 (2012), 528-538. | DOI | MR | Zbl

[7] Guo, X. P., Zhang, W. Z.: Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. Europ. J, Oper. Res. 238 (2014), 486-496. | DOI | MR

[8] Guo, X. P., Song, X. Y., Zhang, Y.: First passage criteria for continuous-time Markov decision processes with varying discount factors and history-dependent policies. IEEE Trans. Automat. Control 59 (2014), 163-174. | DOI | MR

[9] Hernández-Lerma, O., González-Hernández, J.: Constrained Markov Decision Processes in Borel spaces: the discounted case. Math. Meth. Operat. Res. 52 (2000), 271-285. | DOI | MR

[10] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Processes. Springer-Verlag, New York 1996. | MR | Zbl

[11] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Processes. Springer-Verlag, New York 1999. | MR | Zbl

[12] Hernández-Lerma, O., Lasserre, J. B.: Fatou's lemma and Lebesgue's convergence theorem for measures. J. Appl. Math. Stoch. Anal. 13(2) (2000), 137-146. | DOI | MR

[13] Huang, Y. H., Guo, X. P.: First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta. Math. Appl. Sin-E. 27(2) (2011), 177-190. | DOI | MR | Zbl

[14] Huang, Y. H., Wei, Q. D., Guo, X. P.: Constrained Markov decision processes with first passage criteria. Ann. Oper. Res. 206 (2013), 197-219. | DOI | MR

[15] Mao, X., Piunovskiy, A.: Strategic measures in optimal control problems for stochastic sequences. Stoch. Anal. Appl. 18 (2000), 755-776. | DOI | MR

[16] Piunovskiy, A.: Optimal Control of Random Sequences in Problems with Constraints. Kluwer Academic, Dordrecht 1997. | MR

[17] Piunovskiy, A.: Controlled random sequences: the convex analytic approach and constrained problems. Russ. Math. Surv., 53 (2000), 1233-1293. | DOI | MR

[18] Prokhorov, Y.: Convergence of random processes and limit theorems in probability theory. Theory Probab Appl. 1 (1956), 157-214. | DOI | MR

[19] Wei, Q. D., Guo, X. P.: Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper. Res. Lett. 39 (2011), 369-374. | DOI | MR

[20] Wu, X., Guo, X. P.: First passage optimality and variance minimization of Markov decision processes with varying discount factors. J. Appl. Probab. 52(2) (2015), 441-456. | DOI | MR

[21] Zhang, Y.: Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. TOP 21 (2013), 378-408. | DOI | MR | Zbl

Cité par Sources :