Approximation and adaptive control of Markov processes: Average reward criterion
Kybernetika, Tome 23 (1987) no. 4, pp. 265-288 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

Classification : 60J25, 90C40, 93C40, 93E20
@article{KYB_1987_23_4_a0,
     author = {Hern\'andez-Lerma, On\'esimo},
     title = {Approximation and adaptive control of {Markov} processes: {Average} reward criterion},
     journal = {Kybernetika},
     pages = {265--288},
     year = {1987},
     volume = {23},
     number = {4},
     mrnumber = {912012},
     zbl = {0633.90091},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_1987_23_4_a0/}
}
TY  - JOUR
AU  - Hernández-Lerma, Onésimo
TI  - Approximation and adaptive control of Markov processes: Average reward criterion
JO  - Kybernetika
PY  - 1987
SP  - 265
EP  - 288
VL  - 23
IS  - 4
UR  - http://geodesic.mathdoc.fr/item/KYB_1987_23_4_a0/
LA  - en
ID  - KYB_1987_23_4_a0
ER  - 
%0 Journal Article
%A Hernández-Lerma, Onésimo
%T Approximation and adaptive control of Markov processes: Average reward criterion
%J Kybernetika
%D 1987
%P 265-288
%V 23
%N 4
%U http://geodesic.mathdoc.fr/item/KYB_1987_23_4_a0/
%G en
%F KYB_1987_23_4_a0
Hernández-Lerma, Onésimo. Approximation and adaptive control of Markov processes: Average reward criterion. Kybernetika, Tome 23 (1987) no. 4, pp. 265-288. http://geodesic.mathdoc.fr/item/KYB_1987_23_4_a0/

[1] R. S. Acosta Abreu: Control of Markov chains with unknown parameters and metric state space. Submitted for publication. In Spanish.

[2] R. S. Acosta Abreu, O. Hernandez-Lerma: Iterative adaptive control of denumerable state average-cost Markov systems. Control. Cyber. 14 (1985), 313 - 322. | MR

[3] V. V. Baranov: Recursive algorithms of adaptive control in stochastic systems. Cybernetics 17 (1981), 815-824. | MR

[4] V. V. Baranov: A recursive algorithm in markovian decision processes. Cybernetics 18 (1982), 499-506. | MR | Zbl

[5] D. P. Bertsekas, S. E. Shreve: Stochastic Optimal Control- The Discrete Time Case. Academic Press, New York 1978. | MR | Zbl

[6] A. Federgruen, P. J. Schweitzer: Nonstationary Markov decision problems with converging parameters. J. Optim. Theory Appl. 34 (1981), 207-241. | MR | Zbl

[7] A. Federgruen, H. C. Tijms: The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms. J. Appl. Probab. 15 (1978), 356-373. | MR | Zbl

[8] P. J. Georgin: Contröle de chaines de Markov sur des espaces arbitraires. Ann. Inst. H. Poincare B 14 (1978), 255-277. | MR

[9] J. P. Georgin: Estimation et controle de chaines de Markov sur des espaces arbitraires. In: Lecture Notes Mathematics 636. Springer-Verlag, Berlin-Heidelberg-New York-Tokyo 1978, pp. 71-113. | MR

[10] E. I. Gordienko: Adaptive strategies for certain classes of controlled Markov processes. Theory Probab. Appl. 29 (1985), 504-518. | Zbl

[11] L. G. Gubenko, E. S. Statland: On controlled, discrete-time Markov decision processes. Theory Probab. Math. Statist. 7 (1975), 47-61.

[12] O. Hernández-Lerma: Approximation and adaptive policies in discounted dynamic programming. Bol. Soc. Mat. Mexicana 30 (1985). In press. | MR

[13] O. Hernández-Lerma: Nonstationary value-iteration and adaptive control of discounted semi-Markov processes. J. Math. Anal. Appl. 112 (1985), 435-445. | MR

[14] O. Hernandez-Lerma, S. I. Marcus: Adaptive control of service in queueing systems. Syst. Control Lett. 3 (1983), 283-289. | MR | Zbl

[15] O. Hernández-Lerma, S. I. Marcus: Optimal adaptive control of priority assignment in queueing systems. Syst. Control Lett. 4 (1984), 65 - 75. | MR

[16] O. Hernández-Lerma, S. I. Marcus: Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution. Submitted for publication, 1986. | MR

[17] O. Hernández-Lerma, S. I. Marcus: Nonparametric adaptive control of discrete-time partially observable stochastic systems. Submitted for publication, 1986.

[18] C. J. Himmelberg T. Parthasarathy, F. S. Van Vleck: Optimal plans for dynamic programming problems. Math. Oper. Res. 1 (1976), 390-394. | MR

[19] K. Hinderer: Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter. (Lecture Notes in Operations Research and Mathematical Systems 33.) Springer-Verlag, Berlin-Heidelberg-New York 1970. | MR | Zbl

[20] A. Hordijk P. J. Schweitzer, H. Tijms: The asymptotic behaviour of the minimal total expected cost for the denumerable state Markov decision model. J. Appl. Probab. 12 (1975), 298-305. | MR

[21] P. R. Kumar: A survey of some results in stochastic adaptive control. SIAM J. Control Optim. 23 (1985), 329-380. | MR | Zbl

[22] M. Kurano: Discrete-time markovian decision processes with an unknown parameter - average return criterion. J. Oper. Res. Soc. Japan 15 (1972), 67-76. | MR | Zbl

[23] M. Kurano: Average-optimal adaptive policies in semi-Markov decision processes including an unknown parameter. J. Oper. Res. Soc. Japan 28 (1985), 252-366. | MR | Zbl

[24] P. Mandl: Estimation and control in Markov chains. Adv. Appl. Probab. 6 (1974), 40-60. | MR | Zbl

[25] P. Mandl: On the adaptive control of countable Markov chains. In: Probability Theory, Banach Centre Publications 5, PWB-Polish Scientific Publishers, Warsaw 1979, pp. 159- 173. | MR | Zbl

[26] H. L. Royden: Real Analysis. Macmillan, New York 1968. | MR

[27] M. Schäl: Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z. Wahrsch. verw. Gebiete 32 (1975), 179-196. | MR

[28] M. Schäl: Estimation and control in discounted stochastic dynamic programming. Preprint No. 428, Institute for Applied Math., University of Bonn, Bonn 1981. | MR

[29] H. C. Tijms: On dynamic programming with arbitrary state space, compact action space and the average reward as criterion. Report BW 55/75, Mathematisch Centrum, Amsterdam 1975.

[30] T. Ueno: Some limit theorems for temporally discrete Markov processes. J. Fac. Science, University of Tokyo 7 (1957), 449-462. | MR | Zbl

[31] D. J. White: Dynamic programming, Markov chains, and the method of successive approximations. J. Math. Anal. Appl. 6 (1963), 373-376. | MR

[32] P. Mandl, G. Hiibner: Transient phenomena and self-optimizing control of Markov chains. Acta Universitatis Carolinae - Math, et Phys. 26 (1985), 1, 35-51. | MR

[33] A. Hordijk, H. Tijms: A modified form of the iterative method of dynamic programming. Ann. Statist. 3 (1975), 1, 203-208. | MR | Zbl