Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion
Kybernetika, Tome 34 (1998) no. 2, pp. 217-234 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_{t+1}=F(x_t,a_t,\xi _t),\,\,t=0,1,\ldots $ with i.i.d. $\Re ^k$-valued random vectors $\xi _t$ whose density $\rho $ is unknown. Assuming observability of $\xi _t$ we propose the procedure of statistical estimation of $\rho $ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.
We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_{t+1}=F(x_t,a_t,\xi _t),\,\,t=0,1,\ldots $ with i.i.d. $\Re ^k$-valued random vectors $\xi _t$ whose density $\rho $ is unknown. Assuming observability of $\xi _t$ we propose the procedure of statistical estimation of $\rho $ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.
Classification : 60J05, 62M05, 93C40, 93E35
Keywords: Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence
@article{KYB_1998_34_2_a8,
     author = {Gordienko, Evgueni I. and Minj\'arez-Sosa, J. Adolfo},
     title = {Adaptive control for discrete-time {Markov} processes with unbounded costs: {Discounted} criterion},
     journal = {Kybernetika},
     pages = {217--234},
     year = {1998},
     volume = {34},
     number = {2},
     mrnumber = {1621512},
     zbl = {1274.90474},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_1998_34_2_a8/}
}
TY  - JOUR
AU  - Gordienko, Evgueni I.
AU  - Minjárez-Sosa, J. Adolfo
TI  - Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion
JO  - Kybernetika
PY  - 1998
SP  - 217
EP  - 234
VL  - 34
IS  - 2
UR  - http://geodesic.mathdoc.fr/item/KYB_1998_34_2_a8/
LA  - en
ID  - KYB_1998_34_2_a8
ER  - 
%0 Journal Article
%A Gordienko, Evgueni I.
%A Minjárez-Sosa, J. Adolfo
%T Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion
%J Kybernetika
%D 1998
%P 217-234
%V 34
%N 2
%U http://geodesic.mathdoc.fr/item/KYB_1998_34_2_a8/
%G en
%F KYB_1998_34_2_a8
Gordienko, Evgueni I.; Minjárez-Sosa, J. Adolfo. Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion. Kybernetika, Tome 34 (1998) no. 2, pp. 217-234. http://geodesic.mathdoc.fr/item/KYB_1998_34_2_a8/

[1] Agrawal R.: Minimizing the learning loss in adaptive control of Markov chains under the weak accessibility condition. J. Appl. Probab. 28 (1991), 779–790 | DOI | MR | Zbl

[2] Ash R. B.: Real Analysis and Probability. Academic Press, New York 1972 | MR

[3] Cavazos–Cadena R.: Nonparametric adaptive control of discounted stochastic system with compact state space. J. Optim. Theory Appl. 65 (1990), 191–207 | DOI | MR

[4] Dynkin E. B., A A.: Yushkevich: Controlled Markov Processes. Springer–Verlag, New York 1979 | MR

[5] Fernández–Gaucherand E., Arapostathis A., Marcus S. I.: A methodology for the adaptive control of Markov chains under partial state information. In: Proc. of the 1992 Conf. on Information Sci. and Systems, Princeton, New Jersey, pp. 773–775

[6] Fernández–Gaucherand E., Arapostathis A., Marcus S. I.: Analysis of an adaptive control scheme for a partially observed controlled Markov chain. IEEE Trans. Automat. Control 38 (1993), 987–993 | DOI | MR | Zbl

[7] Gordienko E. I.: Adaptive strategies for certain classes of controlled Markov processes. Theory Probab. Appl. 29 (1985), 504–518 | Zbl

[8] Gordienko E. I.: Controlled Markov sequences with slowly varying characteristics II. Adaptive optimal strategies. Soviet J. Comput. Systems Sci. 23 (1985), 87–93 | MR | Zbl

[9] Gordienko E. I., Hernández–Lerma O.: Average cost Markov control processes with weighted norms: value iteration. Appl. Math. 23 (1995), 219–237 | MR | Zbl

[10] Gordienko E. I., Montes–de–Oca R., Minjárez–Sosa J. A.: Approximation of average cost optimal policies for general Markov decision processes with unbounded costs. Math. Methods Oper. Res. 45 (1997), 2, to appear | DOI | MR | Zbl

[11] Hasminskii R., Ibragimov I.: On density estimation in the view of Kolmogorov’s ideas in approximation theory. Ann. of Statist. 18 (1990), 999–1010 | DOI | MR | Zbl

[12] Hernández–Lerma O.: Adaptive Markov Control Processes. Springer–Verlag, New York 1989 | MR | Zbl

[13] Hernández–Lerma O.: Infinite–horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality. Reporte Interno 165. Departamento de Matemáticas, CINVESTAV-IPN, A.P. 14-740.07000, México, D. F., México (1994). (Submitted for publication)

[14] Hernández–Lerma O., Cavazos–Cadena R.: Density estimation and adaptive control of Markov processes: average and discounted criteria. Acta Appl. Math. 20 (1990), 285–307 | DOI | MR | Zbl

[15] Hernández–Lerma O., Lasserre J. B.: Discrete–Time Markov Control Processes. Springer–Verlag, New York 1995 | Zbl

[16] Hernández–Lerma O., Marcus S. I.: Adaptive control of discounted Markov decision chains. J. Optim. Theory Appl. 46 (1985), 227–235 | DOI | MR | Zbl

[17] Hernández–Lerma O., Marcus S. I.: Adaptive policies for discrete–time stochastic control system with unknown disturbance distribution. Systems Control Lett. 9 (1987), 307–315 | DOI | MR

[18] Hinderer K.: Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. (Lecture Notes in Operations Research and Mathematical Systems 33.) Springer–Verlag, Berlin – Heidelberg – New York 1970 | MR | Zbl

[19] Köthe G.: Topological Vector Spaces I. Springer–Verlag, New York 1969 | MR

[20] Kumar P. R., Varaiya P.: Stochastic Systems: Estimation, Identification and Adaptive Control. Prentice–Hall, Englewood Cliffs 1986 | Zbl

[21] Lippman S. A.: On dynamic programming with unbounded rewards. Management Sci. 21 (1975), 1225–1233 | DOI | MR | Zbl

[22] Mandl P.: Estimation and control in Markov chains. Adv. in Appl. Probab. 6 (1974), 40–60 | DOI | MR | Zbl

[23] Rieder U.: Measurable selection theorems for optimization problems. Manuscripta Math. 24 (1978), 115–131 | DOI | MR | Zbl

[24] Schäl M.: Estimation and control in discounted stochastic dynamic programming. Stochastics 20 (1987), 51–71 | DOI | MR

[25] Stettner L.: On nearly self-optimizing strategies for a discrete–time uniformly ergodic adaptive model. J. Appl. Math. Optim. 27 (1993), 161–177 | DOI | MR | Zbl

[26] Stettner L.: Ergodic control of Markov process with mixed observation structure. Dissertationes Math. 341 (1995), 1–36 | MR

[27] Nunen J. A. E. E. van, Wessels J.: A note on dynamic programming with unbounded rewards. Management Sci. 24 (1978), 576–580 | DOI