A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs

Vega-Amaya, Óscar; López-Borbón, Joaquín

doi:10.14736/kyb-2019-1-0081

Vega-Amaya, Óscar ; López-Borbón, Joaquín

Kybernetika, Tome 55 (2019) no. 1, pp. 81-113

Voir la notice de l'article provenant de la source Czech Digital Mathematics Library

Résumé

The present paper studies the approximate value iteration (AVI) algorithm for the average cost criterion with bounded costs and Borel spaces. It is shown the convergence of the algorithm and provided a performance bound assuming that the model satisfies a standard continuity-compactness assumption and a uniform ergodicity condition. This is done for the class of approximation procedures that can be represented by linear positive operators which give exact representation of constant functions and also satisfy certain continuity property. The main point is that these operators define transition probabilities on the state space of the controlled system. This has the following important consequences: (a) the approximating function is the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be seen as a perturbation of the original Markov model; (c) the perturbed model inherits the ergodicity properties imposed on the original Markov model. These facts allow to bound the AVI algorithm performance in terms of the accuracy of the approximations given by this kind of operators for the primitive data model, namely, the one-step reward function and the system transition law. The bounds are given in terms of the supremum norm of bounded functions and the total variation norm of finite-signed measures. The results are illustrated with numerical approximations for a class of single item inventory systems with linear order cost, no set-up cost and no back-orders.

MR Zbl

DOI : 10.14736/kyb-2019-1-0081

Classification : 90C40, 90C59, 93E20
Keywords: Markov decision processes; average cost criterion; approximate value iteration algorithm; contraction and non-expansive operators; perturbed Markov decision models

@article{10_14736_kyb_2019_1_0081,
     author = {Vega-Amaya, \'Oscar and L\'opez-Borb\'on, Joaqu{\'\i}n},
     title = {A perturbation approach to approximate value iteration for average cost {Markov} decision processes with {Borel} spaces and bounded costs},
     journal = {Kybernetika},
     pages = {81--113},
     publisher = {mathdoc},
     volume = {55},
     number = {1},
     year = {2019},
     doi = {10.14736/kyb-2019-1-0081},
     mrnumber = {3935416},
     zbl = {07088880},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.14736/kyb-2019-1-0081/}
}

TY  - JOUR
AU  - Vega-Amaya, Óscar
AU  - López-Borbón, Joaquín
TI  - A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs
JO  - Kybernetika
PY  - 2019
SP  - 81
EP  - 113
VL  - 55
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/articles/10.14736/kyb-2019-1-0081/
DO  - 10.14736/kyb-2019-1-0081
LA  - en
ID  - 10_14736_kyb_2019_1_0081
ER  -

%0 Journal Article
%A Vega-Amaya, Óscar
%A López-Borbón, Joaquín
%T A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs
%J Kybernetika
%D 2019
%P 81-113
%V 55
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/articles/10.14736/kyb-2019-1-0081/
%R 10.14736/kyb-2019-1-0081
%G en
%F 10_14736_kyb_2019_1_0081

Vega-Amaya, Óscar; López-Borbón, Joaquín. A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs. Kybernetika, Tome 55 (2019) no. 1, pp. 81-113. doi: 10.14736/kyb-2019-1-0081

Cité par Sources :

Parcourir par

Geodesic

Parcourir par