Keywords: Markov decision processes; average cost criterion; approximate value iteration algorithm; contraction and non-expansive operators; perturbed Markov decision models
@article{10_14736_kyb_2019_1_0081,
author = {Vega-Amaya, \'Oscar and L\'opez-Borb\'on, Joaqu{\'\i}n},
title = {A perturbation approach to approximate value iteration for average cost {Markov} decision processes with {Borel} spaces and bounded costs},
journal = {Kybernetika},
pages = {81--113},
year = {2019},
volume = {55},
number = {1},
doi = {10.14736/kyb-2019-1-0081},
mrnumber = {3935416},
zbl = {07088880},
language = {en},
url = {http://geodesic.mathdoc.fr/articles/10.14736/kyb-2019-1-0081/}
}
TY - JOUR AU - Vega-Amaya, Óscar AU - López-Borbón, Joaquín TI - A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs JO - Kybernetika PY - 2019 SP - 81 EP - 113 VL - 55 IS - 1 UR - http://geodesic.mathdoc.fr/articles/10.14736/kyb-2019-1-0081/ DO - 10.14736/kyb-2019-1-0081 LA - en ID - 10_14736_kyb_2019_1_0081 ER -
%0 Journal Article %A Vega-Amaya, Óscar %A López-Borbón, Joaquín %T A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs %J Kybernetika %D 2019 %P 81-113 %V 55 %N 1 %U http://geodesic.mathdoc.fr/articles/10.14736/kyb-2019-1-0081/ %R 10.14736/kyb-2019-1-0081 %G en %F 10_14736_kyb_2019_1_0081
Vega-Amaya, Óscar; López-Borbón, Joaquín. A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs. Kybernetika, Tome 55 (2019) no. 1, pp. 81-113. doi: 10.14736/kyb-2019-1-0081
[1] Abounadi, J., Bertsekas, D., Borkar, V. S.: Learning algorithms for Markov decision processes with average cost. SIAM J. Control Optim. 40 (2001), 681-698. | DOI | MR
[2] Aliprantis, C. D., Border, K. C.: Infinite Dimensional Analysis. Third edition. Springer-Verlag, Berlin 2006. | MR
[3] Almudevar, A.: Approximate fixed point iteration with an application to infinite horizon Markov decision processes. SIAM J. Control Optim. 46 (2008), 541-561. | DOI | MR
[4] Araposthatis, A., Borkar, V. S., Fernández-Guacherand, E., Gosh, M. K., Marcus, S. I.: Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31 (1993) 282-344. | DOI | MR
[5] Beutel, L., Gonska, H., Kacsó, D.: On variation-diminishing Shoenberg operators: new quantitative statements. In: Multivariate Approximation and Interpolations with Applications (M. Gasca, ed.), Monografías de la Academia de Ciencias de Zaragoza No. 20 2002, pp. 9-58. | MR
[6] Bertsekas, D. P.: Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs NJ 1987. | MR | Zbl
[7] Bertsekas, D. P., Tsitsiklis, J. N.: Neuro-Dynamic Programming. Athena Scientific, Belmont 1996. | MR | Zbl
[8] Bertsekas, D. P.: Approximate policy iteration: a survey and some new methods. J. Control Theory Appl. 9 (2011), 310-335. | DOI | MR
[9] H., Chang, \S., Marcus, S. I.: Approximate receding horizon approach for Markov decision processes: average reward case. J. Math. Anal. Appl. 286 (2003), 636-651. | DOI | MR
[10] Chang, H. S., Hu, J., Fu, M. C., Marcus, S. I.: Simulation-Based Algorithms for Markov Decision Processes. Second edition. Springer-Verlag, London 2013. | DOI | MR
[11] Cooper, W. L., Henderson, S. G., Lewis, M. E.: Convergence of simulation-based policy iteration. Prob. Eng. Inform. Sci. 17 (2003), 213-234. | DOI | MR
[12] DeVore, R. A.: The Approximation of Continuous Functions by Positive Linear Operators. Lectures Notes in Mathematics 293. Springer-Verlag, Berlin, Heidelberg 1972. | DOI | MR
[13] Dorea, C. C. Y., Pereira, A. G. C.: A note on a variations of Doeblin's condition for uniform ergodicity of Markov chains. Acta Math. Hungar. 110, Issue 4, (2006), 287-292. | DOI | MR
[14] Prieto-Rumeau, F. Dufour adn T.: Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388 (2012), 1254-1267. | DOI | MR
[15] Dufour, F., Prieto-Rumeau, T.: Stochastic approximations of constrained discounted Markov decision processes. J. Math. Anal. Appl. 413 (2014), 856-879. | DOI | MR
[16] Dufour, F., Prieto-Rumeau, T.: Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochastics 87 (2015), 273-307. | DOI | MR
[17] Farias, D. P. de, Roy, B. van: On the existence of fixed points for approximate value iteration and temporal difference learning. J. Optim. Theory Appl. 105 (2000), 589-608. | DOI | MR
[18] Farias, D. P. de, Roy, B. van: Approximate linear programming for average-cots dynamic programming. In: Advances in Neural Information Processing Systems 15 (S. Becker, S. Thrun and K. Obermayer, eds.), MIT Press, Cambridge MA 2002, pp. 1587-1594.
[19] Farias, D. P. de, Roy, B. Van: A cost-shaping linear program for average-cost approximate dynamic programming with performance guarantees. Math. Oper. Res. 31 (2006), 597-620. | DOI | MR
[20] Gordon, G. J.: Stable function approximation dynamic programming. In: Proc. Twelfth International Conference on Machine Learning (A. Prieditis and S. J. Russell, eds.), Tahoe City CA 1995, pp. 261-268. | DOI
[21] Gosavi, A.: A reinforcement learning algorithm based on policy iteration for average reward: empirical results with yield management and convergence analysis. Machine Learning 55 (2004), 5-29. | DOI | MR
[22] Hernández-Lerma, O.: Adaptive Markov Control Processes. Springer-Verlag, NY 1989. | DOI | MR | Zbl
[23] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Processes. Basic Optimality Criteria. Springer-Verlag, NY 1996. | DOI | MR | Zbl
[24] Hernández-Lerma, O., Lasserre, J. B.: Further Topics on Discrete-Time Markov Control Processes. Springer-Verlag, NY 1999. | DOI | MR | Zbl
[25] Hernández-Lerma, O., Lasserre, J. B.: Markov Chains and Invariant Probabilities. Birkhauser Verlag, Basel 2003. | DOI | MR | Zbl
[26] Hernández-Lerma, O., Montes-de-Oca, R., Cavazos-Cadena, R.: Recurrence conditions for Markov decision processes with Borel spaces: a survey. Ann. Oper. Res. 29 (1991), 29-46. | DOI | MR
[27] Hernández-Lerma, O., Vega-Amaya, O., Carrasco, G.: Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optim. 38 (1999), 79-93. | DOI | MR | Zbl
[28] Jaskiewicz, A., Nowak, A. S.: On the optimality equation for average cost Markov control processes with Feller transitions probabilities. J. Math. Anal. Appl. 316 (2006), 495-509. | DOI | MR
[29] Klein, E., Thompson, A. C.: Theory of Correspondences. Wiley, New York 1984. | MR
[30] Konda, V. R., Tsitsiklis, J. N.: Actor-critic algorithms. SIAM J. Control Optim. 42 (2003), 1143-1166. | DOI | MR
[31] Lee, J. M., Lee, J. H.: Approximate dynamic programming strategies and their applicability for process control: a review and future direction. Int. J. Control Automat. Systems 2 (2004), 263-278.
[32] Meyn, S. P., Tweedie, R. L.: Markov Chain and Stochastic Stability. Springer-Verlag, London 1993. | DOI | MR
[33] Montes-de-Oca, R., Lemus-Rodríguez, E.: An unbounded Berge's minimum theorem with applications to discounted Markov decision processes. Kybernetika 48 (2012), 268-286. | MR | Zbl
[34] Munos, R.: Performance bounds in $L_{p}$-norm for approximate value iteration. SIAM J. Control Optim. 47 (2007), 2303-2347. | DOI | MR
[35] Nowak, A. S.: A generalization of Ueno's inequality for n-step transition probabilities. Appl. Math. 25 (1998), 295-299. | DOI | MR
[36] Ortner, R.: Pseudometrics for state aggregation in average reward Markov decision processes. In: Algorithmic Learning Theory LNAI 4754 (M. Hutter, R. A. Serveido and E. Takimoto, eds.), Springer, Berlin, Heidelberg 2007, pp. 373-387. | DOI
[37] Puterman, M. L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, NY 1994. | DOI | MR | Zbl
[38] Powell, W. P.: Approximate Dynamic Programming. Solving the Curse of Dimensionality. John Wiley and Sons Inc., 2007. | DOI | MR
[39] Powell, W. P.: What you should know about approximate dynamic programming. Naval Res. Logist. 56 (2009), 239-249. | DOI | MR
[40] Powell, W. P.: Perspectives of approximate dynamic programming. Ann. Oper. Res. 241 (2012), 319-356. | DOI | MR
[41] Powell, W. P., Ma, J.: A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications. J. Control Theory Appl. 9 (2011), 336-352. | DOI | MR
[42] Robles-Alcaraz, M. T., Vega-Amaya, O., Minjárez-Sosa, J. Adolfo: Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces. Risk Decision Anal. 6 (2017), 79-95. | DOI
[43] Rust, J.: Numerical dynamic programming in economics. In: Handbook of Computational Economics, Vol. 13 (H. Amman, D. Kendrick and J. Rust, eds.), North-Holland, Amsterdam 1996, pp. 619-728. | DOI | MR
[44] Santos, M. S.: Analysis of a numerical dynamic programming algorithm applied to economic models. Econometrica 66 (1998), 409-426. | DOI | MR
[45] Santos, M. S., Rust, J.: Convergence properties of policy iteration. SIAM J. Control Optim. 42 (2004) 2094-2115. | DOI | MR
[46] Saldi, N., Yuksel, S., Linder, T.: Asymptotic optimality of finite approximations to Markov decision processes with Borel spaces. Math. Oper. Res. 42 (2017), 945-978. | DOI | MR
[47] Stachurski, J.: Continuous state dynamic programming via nonexpansive approximation. Comput. Economics 31 (2008), 141-160. | DOI
[48] Sutton, R. S., Barto, A. G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge MA 1998. | DOI
[49] Roy, B. Van: Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31 (2006), 234-244. | DOI | MR
[50] Vega-Amaya, O.: The average cost optimality equation: a fixed-point approach. Bol. Soc. Mat. Mexicana 9 (2003), 185-195. | MR
[51] Vega-Amaya, O.: Zero-sum average semi-Markov games: fixed point solutions of the Shapley equation. SIAM J. Control Optim. 42 (2003), 1876-1894. | DOI | MR
[52] Vega-Amaya, O.: Solutions of the average cost optimality equation for Markov decision processes with weakly continuous kernel: The fixed-point approach revisited. J. Math. Anal. Appl. 464 (2018), 152-163. | DOI | MR
[53] Vega-Amaya, O., López-Borbón, J.: A Perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces. J. Dyn. Games 3 (2016), 261-278. | DOI | MR
[54] Montes-de-Oca, O. Vega-Amaya amd R.: Application of average dynamic programming to inventory systems. Math. Methods Oper. Res. 47 (1998) 451-471. | DOI | MR
Cité par Sources :