A stopping rule for discounted Markov decision processes with finite action sets
Kybernetika, Tome 45 (2009) no. 5, pp. 755-767.

Voir la notice de l'article provenant de la source Czech Digital Mathematics Library

In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.
Classification : 90C40, 93E20
Keywords: Markov decision process; ergodicity condition; value iteration; discounted cost; optimal policy; myopic policies
@article{KYB_2009__45_5_a4,
     author = {Montes-de-Oca, Ra\'ul and Lemus-Rodr{\'\i}guez, Enrique and Cruz-Su\'arez, Daniel},
     title = {A stopping rule for discounted {Markov} decision processes with finite action sets},
     journal = {Kybernetika},
     pages = {755--767},
     publisher = {mathdoc},
     volume = {45},
     number = {5},
     year = {2009},
     mrnumber = {2599110},
     zbl = {1190.93107},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2009__45_5_a4/}
}
TY  - JOUR
AU  - Montes-de-Oca, Raúl
AU  - Lemus-Rodríguez, Enrique
AU  - Cruz-Suárez, Daniel
TI  - A stopping rule for discounted Markov decision processes with finite action sets
JO  - Kybernetika
PY  - 2009
SP  - 755
EP  - 767
VL  - 45
IS  - 5
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/KYB_2009__45_5_a4/
LA  - en
ID  - KYB_2009__45_5_a4
ER  - 
%0 Journal Article
%A Montes-de-Oca, Raúl
%A Lemus-Rodríguez, Enrique
%A Cruz-Suárez, Daniel
%T A stopping rule for discounted Markov decision processes with finite action sets
%J Kybernetika
%D 2009
%P 755-767
%V 45
%N 5
%I mathdoc
%U http://geodesic.mathdoc.fr/item/KYB_2009__45_5_a4/
%G en
%F KYB_2009__45_5_a4
Montes-de-Oca, Raúl; Lemus-Rodríguez, Enrique; Cruz-Suárez, Daniel. A stopping rule for discounted Markov decision processes with finite action sets. Kybernetika, Tome 45 (2009) no. 5, pp. 755-767. http://geodesic.mathdoc.fr/item/KYB_2009__45_5_a4/