Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains
Kybernetika, Tome 41 (2005) no. 6, pp. 757-772 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

This paper deals with Markov decision processes (MDPs) with real state space for which its minimum is attained, and that are upper bounded by (uncontrolled) stochastically ordered (SO) Markov chains. We consider MDPs with (possibly) unbounded costs, and to evaluate the quality of each policy, we use the objective function known as the average cost. For this objective function we consider two Markov control models ${\mathbb{P}}$ and ${\mathbb{P}}_{1}$. $\mathbb{P}$ and ${\mathbb{P}}_{1}$ have the same components except for the transition laws. The transition $q$ of $\mathbb{P}$ is taken as unknown, and the transition $q_{1}$ of ${\mathbb{P}}_{1}$, as a known approximation of $q$. Under certain irreducibility, recurrence and ergodic conditions imposed on the bounding SO Markov chain (these conditions give the rate of convergence of the transition probability in $t$-steps, $t=1,2,\ldots $ to the invariant measure), the difference between the optimal cost to drive $\mathbb{P}$ and the cost obtained to drive $\mathbb{P}$ using the optimal policy of ${\mathbb{P}}_{1}$ is estimated. That difference is defined as the index of perturbations, and in this work upper bounds of it are provided. An example to illustrate the theory developed here is added.
This paper deals with Markov decision processes (MDPs) with real state space for which its minimum is attained, and that are upper bounded by (uncontrolled) stochastically ordered (SO) Markov chains. We consider MDPs with (possibly) unbounded costs, and to evaluate the quality of each policy, we use the objective function known as the average cost. For this objective function we consider two Markov control models ${\mathbb{P}}$ and ${\mathbb{P}}_{1}$. $\mathbb{P}$ and ${\mathbb{P}}_{1}$ have the same components except for the transition laws. The transition $q$ of $\mathbb{P}$ is taken as unknown, and the transition $q_{1}$ of ${\mathbb{P}}_{1}$, as a known approximation of $q$. Under certain irreducibility, recurrence and ergodic conditions imposed on the bounding SO Markov chain (these conditions give the rate of convergence of the transition probability in $t$-steps, $t=1,2,\ldots $ to the invariant measure), the difference between the optimal cost to drive $\mathbb{P}$ and the cost obtained to drive $\mathbb{P}$ using the optimal policy of ${\mathbb{P}}_{1}$ is estimated. That difference is defined as the index of perturbations, and in this work upper bounds of it are provided. An example to illustrate the theory developed here is added.
Classification : 90C40, 93E20
Keywords: stochastically ordered Markov chains; Lyapunov condition; invariant probability; average Markov decision processes
@article{KYB_2005_41_6_a5,
     author = {Montes-de-Oca, Ra\'ul and Salem-Silva, Francisco},
     title = {Estimates for perturbations of average {Markov} decision processes with a minimal state and upper bounded by stochastically ordered {Markov} chains},
     journal = {Kybernetika},
     pages = {757--772},
     year = {2005},
     volume = {41},
     number = {6},
     mrnumber = {2193864},
     zbl = {1249.90313},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2005_41_6_a5/}
}
TY  - JOUR
AU  - Montes-de-Oca, Raúl
AU  - Salem-Silva, Francisco
TI  - Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains
JO  - Kybernetika
PY  - 2005
SP  - 757
EP  - 772
VL  - 41
IS  - 6
UR  - http://geodesic.mathdoc.fr/item/KYB_2005_41_6_a5/
LA  - en
ID  - KYB_2005_41_6_a5
ER  - 
%0 Journal Article
%A Montes-de-Oca, Raúl
%A Salem-Silva, Francisco
%T Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains
%J Kybernetika
%D 2005
%P 757-772
%V 41
%N 6
%U http://geodesic.mathdoc.fr/item/KYB_2005_41_6_a5/
%G en
%F KYB_2005_41_6_a5
Montes-de-Oca, Raúl; Salem-Silva, Francisco. Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains. Kybernetika, Tome 41 (2005) no. 6, pp. 757-772. http://geodesic.mathdoc.fr/item/KYB_2005_41_6_a5/

[1] Favero F., Runglandier W. J.: A robustness result for stochastic control. Systems Control Lett. 46 (2002), 91–97 | DOI | MR

[2] Gordienko E. I.: An estimate of the stability of optimal control of certain stochastic and deterministic systems. J. Soviet Math. 50 (1992), 891–899 | DOI | MR

[3] Gordienko E. I.: Lecture Notes on Stability Estimation in Markov Decision Processes. Universidad Autónoma Metropolitana, México D.F., 1994

[4] Gordienko E. I., Hernández-Lerma O.: Average cost Markov control processes with weighted norms: value iteration. Appl. Math. 23 (1995), 219–237 | MR | Zbl

[5] Gordienko E. I., Salem-Silva F. S.: Robustness inequality for Markov control processes with unbounded costs. Systems Control Lett. 33 (1998), 125–130 | DOI | MR

[6] Gordienko E. I., Salem-Silva F. S.: Estimates of stability of Markov control processes with unbounded costs. Kybernetika 36 (2000), 2, 195–210 | MR

[7] Hernández-Lerma O.: Adaptive Markov Control Processes. Springer–Verlag, New York 1989 | MR

[8] Hernández-Lerma O., Lasserre J. B.: Further Topics on Discrete-Time Markov Control Processes. Springer–Verlag, New York 1999 | MR | Zbl

[9] Hinderer K.: Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter. (Lectures Notes in Operations Research and Mathematical Systems 33.) Springer–Verlag, Berlin – Heidelberg – New York 1970 | MR | Zbl

[10] Lindvall T.: Lectures on the Coupling Method. (Wiley Series in Probability and Mathematical Statistics.) Wiley, New York 1992 | MR | Zbl

[11] Lund R.: The geometric convergence rates of a Lindley random walk. J. Appl. Probab. 34 (1997), 806–811 | DOI | MR

[12] Lund R., Tweedie R.: Geometric convergence rates for stochastically ordered Markov chains. Math. Oper. Res. 20 (1996), 182–194 | DOI | MR | Zbl

[13] Meyn S., Tweedie R.: Markov Chains and Stochastic Stability. Springer–Verlag, New York 1993 | MR | Zbl

[14] Montes-de-Oca R., Sakhanenko, A., Salem-Silva F.: Estimates for perturbations of general discounted Markov control chains. Appl. Math. 30 (2003), 3, 287–304 | MR | Zbl

[15] Nummelin E.: General Irreducible Markov Chains and Non-negative Operators. Cambrigde University Press, Cambridge 1984 | MR | Zbl

[16] Rachev S. T.: Probability Metrics and the Stability of Stochastic Models. Wiley, New York 1991 | MR | Zbl

[17] Zolotarev V. M.: On stochastic continuity of queueing systems of type G/G/1. Theory Probab. Appl. 21 (1976), 250–269 | MR | Zbl