Estimates of stability of Markov control processes with unbounded costs
Kybernetika, Tome 36 (2000) no. 2, pp. 195-210 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

For a discrete-time Markov control process with the transition probability $p$, we compare the total discounted costs $V_\beta $ $(\pi _\beta )$ and $V_\beta (\tilde{\pi }_\beta )$, when applying the optimal control policy $\pi _\beta $ and its approximation $\tilde{\pi }_\beta $. The policy $\tilde{\pi }_\beta $ is optimal for an approximating process with the transition probability $\tilde{p}$. A cost per stage for considered processes can be unbounded. Under certain ergodicity assumptions we establish the upper bound for the relative stability index $[V_\beta (\tilde{\pi }_\beta )-V_\beta (\pi _\beta )]/V_\beta (\pi _\beta )$. This bound does not depend on a discount factor $\beta \in (0,1)$ and this is given in terms of the total variation distance between $p$ and $\tilde{p}$.
For a discrete-time Markov control process with the transition probability $p$, we compare the total discounted costs $V_\beta $ $(\pi _\beta )$ and $V_\beta (\tilde{\pi }_\beta )$, when applying the optimal control policy $\pi _\beta $ and its approximation $\tilde{\pi }_\beta $. The policy $\tilde{\pi }_\beta $ is optimal for an approximating process with the transition probability $\tilde{p}$. A cost per stage for considered processes can be unbounded. Under certain ergodicity assumptions we establish the upper bound for the relative stability index $[V_\beta (\tilde{\pi }_\beta )-V_\beta (\pi _\beta )]/V_\beta (\pi _\beta )$. This bound does not depend on a discount factor $\beta \in (0,1)$ and this is given in terms of the total variation distance between $p$ and $\tilde{p}$.
Classification : 60J99, 90C40, 93C55, 93E20
Keywords: discrete-time Markov control process; unbounded cost
@article{KYB_2000_36_2_a3,
     author = {Gordienko, Evgueni I. and Salem, Francisco},
     title = {Estimates of stability of {Markov} control processes with unbounded costs},
     journal = {Kybernetika},
     pages = {195--210},
     year = {2000},
     volume = {36},
     number = {2},
     mrnumber = {1760024},
     zbl = {1249.93176},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2000_36_2_a3/}
}
TY  - JOUR
AU  - Gordienko, Evgueni I.
AU  - Salem, Francisco
TI  - Estimates of stability of Markov control processes with unbounded costs
JO  - Kybernetika
PY  - 2000
SP  - 195
EP  - 210
VL  - 36
IS  - 2
UR  - http://geodesic.mathdoc.fr/item/KYB_2000_36_2_a3/
LA  - en
ID  - KYB_2000_36_2_a3
ER  - 
%0 Journal Article
%A Gordienko, Evgueni I.
%A Salem, Francisco
%T Estimates of stability of Markov control processes with unbounded costs
%J Kybernetika
%D 2000
%P 195-210
%V 36
%N 2
%U http://geodesic.mathdoc.fr/item/KYB_2000_36_2_a3/
%G en
%F KYB_2000_36_2_a3
Gordienko, Evgueni I.; Salem, Francisco. Estimates of stability of Markov control processes with unbounded costs. Kybernetika, Tome 36 (2000) no. 2, pp. 195-210. http://geodesic.mathdoc.fr/item/KYB_2000_36_2_a3/

[1] Dynkin E. B., Yushkevich A. A.: Controlled Markov Processes. Springer–Verlag, New York 1979 | MR

[4] Gordienko E., Hernández–Lerma O.: Average cost Markov control processes with weighted norms: exitence of canonical policies. Appl. Math. 23 (1995), 199–218 | MR

[5] Gordienko E., Hernández–Lerma O.: Average cost Markov control processes with weighted norms: value iteration. Appl. Math. 23 (1995), 219–237 | MR | Zbl

[6] Gordienko E. I., Isauro-Martínez M. E., Carrillo R. M. Marcos: Estimation of stability in controlled storage systems. Research Report No. 04.0405.I.01.001.97, Dep. de Matemáticas, Universidad Autónoma Metropolitana, México 1997

[7] Gordienko E. I., Salem F. S.: Robustness inequality for Markov control processes with unbounded costs. Systems Control Lett. 33 (1998), 125–130 | DOI | MR | Zbl

[8] Hernández-Lerma O., Lasserre J. B.: Average cost optimal policies for Markov control processes with Borel state space and unbounded costs. Systems Control Lett. 15 (1990), 349–356 | DOI | MR

[9] Hernández-Lerma O., Lassere J. B.: Discrete–time Markov Control Processes. Springer–Verlag, New York 1995

[10] Hinderer H.: Foundations of Non–Stationary Dynamic Programming with Discrete Time Parameter. (Lecture Notes in Operations Research 33.) Springer–Verlag, New York 1970 | MR | Zbl

[11] Kartashov N. V.: Inequalities in theorems of ergodicity and stability for Markov chains with common phase space. II. Theory Probab. Appl. 30 (1985), 507–515 | DOI

[12] Kumar P. R., Varaiya P.: Stochastic Systems: Estimation, Identification and Adaptive Control. Prentice–Hall, Englewood Cliffs, N. J. 1986 | Zbl

[13] Meyn S. P., Tweedie R. L.: Markov Chains and Stochastic Stability. Springer–Verlag, Berlin 1993 | MR | Zbl

[14] Nummelin E.: General Irreducible Markov Chains and Non–Negative Operators. Cambridge University Press, Cambridge 1984 | MR | Zbl

[15] Rachev S. T.: Probability Metrics and the Stability of Stochastic Models. Wiley, New York 1991 | MR | Zbl

[16] Scott D. J., Tweedie R. L.: Explicit rates of convergence of stochastically ordered Markov chains. In: Proc. Athens Conference of Applied Probability and Time Series Analysis: Papers in Honour of J. M. Gani and E. J. Hannan (C. C. Heyde, Yu. V. Prohorov, R. Pyke and S. T. Rachev, eds.). Springer–Verlag, New York 1995, pp. 176–191 | MR

[17] Dijk N. M. Van: Perturbation theory for unbounded Markov reward processes with applications to queueing. Adv. in Appl. Probab. 20 (1988), 99–111 | DOI | MR

[18] Dijk N. M. Van, Puterman M. L.: Perturbation theory for Markov reward processes with applications to queueing systems. Adv. in Appl. Probab. 20 (1988), 79–98 | DOI | MR

[19] Weber R. R., jr. S. Stidham: Optimal control of service rates in networks of queues. Adv. in Appl. Probab. 19 (1987), 202–218 | DOI | MR | Zbl

[20] Whitt W.: Approximations of dynamic programs I. Math. Oper. Res. 3 (1978), 231–243 | DOI | MR | Zbl

[21] Zolotarev V. M.: On stochastic continuity of queueing systems of type $G\vert G\vert 1$. Theory Probab. Appl. 21 (1976), 250–269 | MR