A stopping rule for discounted Markov decision processes with finite action sets

Montes-de-Oca, Raúl; Lemus-Rodríguez, Enrique; Cruz-Suárez, Daniel

Montes-de-Oca, Raúl ; Lemus-Rodríguez, Enrique ; Cruz-Suárez, Daniel

Kybernetika, Tome 45 (2009) no. 5, pp. 755-767

Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

Abstract (VO)
Abstract (VO)

In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.

MR Zbl

Classification : 90C40, 93E20
Keywords: Markov decision process; ergodicity condition; value iteration; discounted cost; optimal policy; myopic policies

@article{KYB_2009_45_5_a4,
     author = {Montes-de-Oca, Ra\'ul and Lemus-Rodr{\'\i}guez, Enrique and Cruz-Su\'arez, Daniel},
     title = {A stopping rule for discounted {Markov} decision processes with finite action sets},
     journal = {Kybernetika},
     pages = {755--767},
     year = {2009},
     volume = {45},
     number = {5},
     mrnumber = {2599110},
     zbl = {1190.93107},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2009_45_5_a4/}
}

TY  - JOUR
AU  - Montes-de-Oca, Raúl
AU  - Lemus-Rodríguez, Enrique
AU  - Cruz-Suárez, Daniel
TI  - A stopping rule for discounted Markov decision processes with finite action sets
JO  - Kybernetika
PY  - 2009
SP  - 755
EP  - 767
VL  - 45
IS  - 5
UR  - http://geodesic.mathdoc.fr/item/KYB_2009_45_5_a4/
LA  - en
ID  - KYB_2009_45_5_a4
ER  -

%0 Journal Article
%A Montes-de-Oca, Raúl
%A Lemus-Rodríguez, Enrique
%A Cruz-Suárez, Daniel
%T A stopping rule for discounted Markov decision processes with finite action sets
%J Kybernetika
%D 2009
%P 755-767
%V 45
%N 5
%U http://geodesic.mathdoc.fr/item/KYB_2009_45_5_a4/
%G en
%F KYB_2009_45_5_a4

Montes-de-Oca, Raúl; Lemus-Rodríguez, Enrique; Cruz-Suárez, Daniel. A stopping rule for discounted Markov decision processes with finite action sets. Kybernetika, Tome 45 (2009) no. 5, pp. 755-767. http://geodesic.mathdoc.fr/item/KYB_2009_45_5_a4/

Bibliographie
Cité par

[1] D. Cruz-Suárez and R. Montes-de-Oca: Uniform convergence of the value iteration policies for discounted Markov decision processes. Bol. Soc. Mat. Mexicana 12 (2006), 133–148. | MR

[2] D. Cruz-Suárez, R. Montes-de-Oca, and F. Salem-Silva: Conditions for the uniqueness of discounted Markov decision processes. Math. Methods Oper. Res. 60 (2004), 415–436. | MR

[3] D. Cruz-Suárez, R. Montes-de-Oca, and F. Salem-Silva: Uniform approximations of discounted Markov decision processes to optimal policies. In: Proc. Prague Stochastics 2006 (M. Hušková and M. Janžura, eds.), MATFYZPRESS, Prague 2006, pp. 278–287.

[4] O. Hernández-Lerma: Adaptive Markov Control Processes Springer-Verlag, New York 1989. | MR

[5] O. Hernández-Lerma and J. B. Lasserre: Discrete–Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996. | MR

[6] O. Hernández-Lerma and J. B. Lasserre: Further Topics on Discrete–Time Markov Control Processes. Springer-Verlag, New York 1999. | MR

[7] M. L. Puterman: Markov Decision Processes. Discrete Stochastic Dynamic Programming. Wiley, New York 1994. | MR | Zbl

[8] R. Ritt and L. Sennott: Optimal stationary policies in general state Markov decision chains with finite action sets. Math. Oper. Res. 17 (1992), 901–909. | MR

[9] N. L. Stokey and R. E. Lucas: Recursive Methods in Economic Dynamics. Harvard University Press, USA 1989. | MR

Parcourir par

Geodesic

Parcourir par