Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs
Kybernetika, Tome 55 (2019) no. 1, pp. 166-182
Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

In this paper we are concerned with a class of time-varying discounted Markov decision models $\mathcal{M}_n$ with unbounded costs $c_n$ and state-action dependent discount factors. Specifically we study controlled systems whose state process evolves according to the equation $x_{n+1}=G_n(x_n,a_n,\xi_n), n=0,1,\ldots$, with state-action dependent discount factors of the form $\alpha_n(x_n,a_n)$, where $a_n$ and $\xi_n$ are the control and the random disturbance at time $n$, respectively. Assuming that the sequences of functions $\lbrace\alpha_n\rbrace$,$\lbrace c_n\rbrace$ and $\lbrace G_n\rbrace$ converge, in certain sense, to $\alpha_\infty$, $c_\infty$ and $G_\infty$, our objective is to introduce a suitable control model for this class of systems and then, to show the existence of optimal policies for the limit system $\mathcal{M}_\infty$ corresponding to $\alpha_\infty$, $c_\infty$ and $G_\infty$. Finally, we illustrate our results and their applicability in a class of semi-Markov control models.
In this paper we are concerned with a class of time-varying discounted Markov decision models $\mathcal{M}_n$ with unbounded costs $c_n$ and state-action dependent discount factors. Specifically we study controlled systems whose state process evolves according to the equation $x_{n+1}=G_n(x_n,a_n,\xi_n), n=0,1,\ldots$, with state-action dependent discount factors of the form $\alpha_n(x_n,a_n)$, where $a_n$ and $\xi_n$ are the control and the random disturbance at time $n$, respectively. Assuming that the sequences of functions $\lbrace\alpha_n\rbrace$,$\lbrace c_n\rbrace$ and $\lbrace G_n\rbrace$ converge, in certain sense, to $\alpha_\infty$, $c_\infty$ and $G_\infty$, our objective is to introduce a suitable control model for this class of systems and then, to show the existence of optimal policies for the limit system $\mathcal{M}_\infty$ corresponding to $\alpha_\infty$, $c_\infty$ and $G_\infty$. Finally, we illustrate our results and their applicability in a class of semi-Markov control models.
DOI : 10.14736/kyb-2019-1-0166
Classification : 90C40, 93E20
Keywords: discounted optimality; non-constant discount factor; time-varying Markov decision processes
@article{10_14736_kyb_2019_1_0166,
     author = {Escobedo-Trujillo, Beatris A. and Higuera-Chan, Carmen G.},
     title = {Time-varying {Markov} decision processes with state-action-dependent discount factors and unbounded costs},
     journal = {Kybernetika},
     pages = {166--182},
     year = {2019},
     volume = {55},
     number = {1},
     doi = {10.14736/kyb-2019-1-0166},
     mrnumber = {3935420},
     zbl = {07088884},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.14736/kyb-2019-1-0166/}
}
TY  - JOUR
AU  - Escobedo-Trujillo, Beatris A.
AU  - Higuera-Chan, Carmen G.
TI  - Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs
JO  - Kybernetika
PY  - 2019
SP  - 166
EP  - 182
VL  - 55
IS  - 1
UR  - http://geodesic.mathdoc.fr/articles/10.14736/kyb-2019-1-0166/
DO  - 10.14736/kyb-2019-1-0166
LA  - en
ID  - 10_14736_kyb_2019_1_0166
ER  - 
%0 Journal Article
%A Escobedo-Trujillo, Beatris A.
%A Higuera-Chan, Carmen G.
%T Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs
%J Kybernetika
%D 2019
%P 166-182
%V 55
%N 1
%U http://geodesic.mathdoc.fr/articles/10.14736/kyb-2019-1-0166/
%R 10.14736/kyb-2019-1-0166
%G en
%F 10_14736_kyb_2019_1_0166
Escobedo-Trujillo, Beatris A.; Higuera-Chan, Carmen G. Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs. Kybernetika, Tome 55 (2019) no. 1, pp. 166-182. doi: 10.14736/kyb-2019-1-0166

[1] Bastin, G., Dochain, D.: On-line Estimation and Adaptive Control of Bioreactors. Elsevier, Amsterdam 2014.

[2] Bertsekas, D. P.: Approximate policy iteration: a survey and some new methods. J. Control Theory Appl. 9 (2011), 310-335. | DOI | MR

[3] Dynkin, E. B., Yushkevich, A. A.: Controlled Markov Processes. Springer-Verlag, New York 1979. | DOI | MR

[4] González-Hernández, J., López-Martínez, R. R., Minjárez-Sosa, J. A.: Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45 (2009), 737-754. | MR

[5] Gordienko, E. I., Minjárez-Sosa, J. A.: Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion. Kybernetika 34 (1998), 217-234. | MR

[6] Hernández-Lerma, O., Lasseerre, J. B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York 1996. | DOI | MR

[7] Hernández-Lerma, \noindent O., Lasserre, J. B.: Further Topics on Discrete-time Markov Control Processes. Springer-Verlag, New York 1999. | DOI | MR

[8] Hernández-Lerma, O., Hilgert, N.: Limiting optimal discounted-cost control of a class of time-varying stochastic systems. Syst. Control Lett. 40 (2000), 1, 37-42. | DOI | MR

[9] Hilgert, N., Minjárez-Sosa, J. A.: Adaptive policies for time-varying stochastic systems under discounted criterion. Math. Meth. Oper. Res. 54 (2001), 3, 491-505. | DOI | MR

[10] Hilgert, N., Minjárez-Sosa, J. A.: Adaptive control of stochastic systems with unknown disturbance distribution: discounted criteria. Math. Meth. Oper. Res. 63 (2006), 443-460. | DOI | MR

[11] Hilgert, N., Senoussi, R., Vila, J. P.: Nonparametric estimation of time-varying autoregressive nonlinear processes. C. R. Acad. Sci. Paris Série 1 1996), 232, 1085-1090. | DOI | MR

[12] Lewis, M. E., Paul, A.: Uniform turnpike theorems for finite Markov decision processes. Math. Oper. Res.

[13] Luque-Vásquez, F., Minjárez-Sosa, J. A.: Semi-Markov control processes with unknown holding times distribution under a discounted criterion. Math. Meth. Oper. Res. 61 (2005), 455-468. | DOI | MR

[14] Luque-Vásquez, F., Minjárez-Sosa, J. A., Rosas-Rosas, L. C.: Semi-Markov control processes with partially known holding times distribution: Discounted and average criteria. Acta Appl. Math. 114 (2011), 3, 135-156. | DOI | MR

[15] Luque-Vásquez, F., Minjárez-Sosa, J. A., Rosas-Rosas, L. C.: Semi-Markov control processes with unknown holding times distribution under an average criterion cost. Appl. Math. Optim. Theory Appl. 61 (2010), 3, 317-336. | DOI | MR

[16] Minjárez-Sosa, J. A.: Markov control models with unknown random state-action-dependent discount factors. TOP 23 (2015), 743-772. | DOI | MR

[17] Minjárez-Sosa, J. A.: Approximation and estimation in Markov control processes under discounted criterion. Kybernetika 40 (2004), 6, 681-690. | MR

[18] Powell, W. B.: Approximate Dynamic Programming. Solving the Curse of Dimensionality. John Wiley and Sons Inc, 2007. | DOI | MR

[19] Puterman, M. L.: Markov Decision Processes. Discrete Stochastic Dynamic Programming. John Wiley and Sons 1994. | DOI | MR

[20] Rieder, U.: Measurable selection theorems for optimization problems. Manuscripta Math. 24 (1978), 115-131. | DOI | MR | Zbl

[21] Robles-Alcaráz, M. T., Vega-Amaya, O., Minjárez-Sosa, J. A.: Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces. Risk Decision Analysis 6 (2017), 2, 79-95. | DOI

[22] Royden, H. L.: Real Analysis. Prentice Hall 1968. | MR | Zbl

[23] Schäl, M.: Conditions for optimality and for the limit on n-stage optimal policies to be optimal. Z. Wahrs. Verw. Gerb. 32 (1975), 179-196. | DOI | MR

[24] Shapiro, J. F.: Turnpike planning horizon for a markovian decision model. Magnament Sci. 14 (1968), 292-300. | DOI

Cité par Sources :