Partially observable Markov decision processes with partially observable random discount factors
Kybernetika, Tome 58 (2022) no. 6, pp. 960-983
Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

This paper deals with a class of partially observable discounted Markov decision processes defined on Borel state and action spaces, under unbounded one-stage cost. The discount rate is a stochastic process evolving according to a difference equation, which is also assumed to be partially observable. Introducing a suitable control model and filtering processes, we prove the existence of optimal control policies. In addition, we illustrate our results in a class of GI/GI/1 queueing systems where we obtain explicitly the corresponding optimality equation and the filtering process.
This paper deals with a class of partially observable discounted Markov decision processes defined on Borel state and action spaces, under unbounded one-stage cost. The discount rate is a stochastic process evolving according to a difference equation, which is also assumed to be partially observable. Introducing a suitable control model and filtering processes, we prove the existence of optimal control policies. In addition, we illustrate our results in a class of GI/GI/1 queueing systems where we obtain explicitly the corresponding optimality equation and the filtering process.
DOI : 10.14736/kyb-2022-6-0960
Classification : 90B22, 90C39
Keywords: partially observable systems; discounted criterion; random discount factors; queueing models; optimal policies
@article{10_14736_kyb_2022_6_0960,
     author = {Martinez-Garcia, E. Everardo and Minj\'arez-Sosa, J. Adolfo and Vega-Amaya, Oscar},
     title = {Partially observable {Markov} decision processes with partially observable random discount factors},
     journal = {Kybernetika},
     pages = {960--983},
     year = {2022},
     volume = {58},
     number = {6},
     doi = {10.14736/kyb-2022-6-0960},
     mrnumber = {4548223},
     zbl = {07655866},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.14736/kyb-2022-6-0960/}
}
TY  - JOUR
AU  - Martinez-Garcia, E. Everardo
AU  - Minjárez-Sosa, J. Adolfo
AU  - Vega-Amaya, Oscar
TI  - Partially observable Markov decision processes with partially observable random discount factors
JO  - Kybernetika
PY  - 2022
SP  - 960
EP  - 983
VL  - 58
IS  - 6
UR  - http://geodesic.mathdoc.fr/articles/10.14736/kyb-2022-6-0960/
DO  - 10.14736/kyb-2022-6-0960
LA  - en
ID  - 10_14736_kyb_2022_6_0960
ER  - 
%0 Journal Article
%A Martinez-Garcia, E. Everardo
%A Minjárez-Sosa, J. Adolfo
%A Vega-Amaya, Oscar
%T Partially observable Markov decision processes with partially observable random discount factors
%J Kybernetika
%D 2022
%P 960-983
%V 58
%N 6
%U http://geodesic.mathdoc.fr/articles/10.14736/kyb-2022-6-0960/
%R 10.14736/kyb-2022-6-0960
%G en
%F 10_14736_kyb_2022_6_0960
Martinez-Garcia, E. Everardo; Minjárez-Sosa, J. Adolfo; Vega-Amaya, Oscar. Partially observable Markov decision processes with partially observable random discount factors. Kybernetika, Tome 58 (2022) no. 6, pp. 960-983. doi: 10.14736/kyb-2022-6-0960

[1] Bensoussan, A., Cakanyildirim, M., Sethi, S. P.: Partially observed inventory systems: the case of zero-balance walk. SIAM J. Control Optim. 46 (2007), 176-209. | DOI

[2] Bertsekas, D. P., Shreve, S. E.: Stochastic Optimal Control: The Discrete Time Case. Academic Press, New York 1978. | DOI | MR | Zbl

[3] Carmon, Y., Shwartz, A.: Markov decision processes with exponentially representable discounting. Oper. Res. Lett. 37 (2009), 51-55. | DOI | MR | Zbl

[4] Cruz-Suárez, H., Montes-de-Oca, R.: Discounted Markov control processes induced by deterministic systems. Kybernetika 42 (2006), 647-664. | MR

[5] Dynkin, E. B., Yushkevich, A. A.: Controlled Markov Processes. Springer-Verlag, New York 1979. | DOI | MR

[6] Elliott, R. J., Aggoun, L., Moore, J. B.: Hidden Markov Models: Estimation and Control. Springer-Verlag, New York 1994. | DOI | MR

[7] Feinberg, E. A., Shwartz, A.: Constrained dynamic programming with two discount factors: applications and an algorithm. IEEE Trans. Automat. Control 44 (1999), 628-631. | DOI | MR | Zbl

[8] González-Hernández, J., López-Martínez, R R., Minjárez-Sosa, J. A.: Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45 (2009), 737-754. | DOI | MR

[9] González-Hernández, J., López-Martínez, R. R., Minjárez-Sosa, J. A., R.Gabriel-Arguelles, J.: Constrained Markov control processes with randomized discounted rate: infinite linear programming approach. Optim. Control Appl. Meth. 35 (2014), 575-591. | DOI | MR

[10] García, Y. H., Diaz-Infante, S., Minjarez-Sosa, J. A.: Partially observable queueing systems with controlled service rates under a discounted optimality criterion. Kybernetika 57 (2021), 493-512. | DOI | MR

[11] Gordienko, E- I-, Salem, F. S.: Robustness inequality for Markov control processes with unbounded costs. Syst. Control Lett. 33 (1998), 125-130. | DOI | MR

[12] Gordienko, E., Lemus-Rodríguez, E., Montes-de-Oca, R.: Discounted cost optimality problem: stability with respect to weak metrics. Math. Methods Oper. Res. 68 (2008), 77-96. | DOI | MR

[13] Gordienko, E., Minjarez-Sosa, J. A.: Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion. Kybernetika 34 (1998), 217-234. | MR

[14] Hernandez-Lerma, O.: Adaptive Markov Control Processes. Springer-Verlag, New York 1989. | DOI | MR

[15] Hernandez-Lerma, O., Runggaldier, W.: Monotone approximations for convex stochastic control problems. J. Math. Syst. Estim. Control 4 (1994), 99-140. | MR

[16] Hernandez-Lerma, O., Munoz-de-Ozak, M.: Discrete-time Markov control processes with discounted unbounded costs: optimality criteria. Kybernetika 28 (1992), 191-221. | DOI | MR

[17] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996. | MR | Zbl

[18] Hilgert, N., Minjarez-Sosa, J. A.: Adaptive policies for time-varying stochastic systems under discounted criterion. Math. Methods Oper. Res. 54 (2001), 491-505. | DOI | MR

[19] Hinderer, K.: Foundations of Non-stationary Dynamic Programming with Discrete Time parameter. In: Lecture Notes Oper. Res. 33, Springer, New York 1979. | MR

[20] Jasso-Fuentes, H., Menaldi, J. L., Prieto-Rumeau, T.: Discrete-time control with non-constant discount factor. Math. Methods Oper. Res. 92 (2020), 377-399. | DOI | MR

[21] Minjarez-Sosa, J. A.: Approximation and estimation in Markov control processes under discounted criterion. Kybernetika 40 (2004), 681-690. | DOI | MR

[22] Minjarez-Sosa, J. A.: Markov control models with unknown random state-action-dependent discount factors. TOP 23 (2015), 743-772. | DOI | MR

[23] Rieder, U.: Measurable selection theorems for optimization problems. Manuscripta Math. 24 (1978), 115-131. | DOI | MR | Zbl

[24] Runggaldier, W. J., Stettner, L.: Approximations of Discrete Time Partially Observed Control Problems. Applied Mathematics Monographs CNR 6, Giardini, Pisa 1994. | DOI

[25] Striebel, C.: Optimal Control of Discrete Time Stochastic Systems. Lecture Notes Econ. Math. Syst. 110, Springer-Verlag, Berlin 1975. | DOI | MR

[26] Wei, Q., Guo, X.: Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper. Res. Lett. 39 (2011), 368-274. | DOI | MR

Cité par Sources :