Growth rates and average optimality in risk-sensitive Markov decision chains
Kybernetika, Tome 44 (2008) no. 2, pp. 205-226 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.
In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.
Classification : 60J10, 90C39, 90C40, 93E20
Keywords: risk-sensitive Markov decision chains; average optimal policies; optimal growth rates
@article{KYB_2008_44_2_a5,
     author = {Sladk\'y, Karel},
     title = {Growth rates and average optimality in risk-sensitive {Markov} decision chains},
     journal = {Kybernetika},
     pages = {205--226},
     year = {2008},
     volume = {44},
     number = {2},
     mrnumber = {2428220},
     zbl = {1154.90612},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2008_44_2_a5/}
}
TY  - JOUR
AU  - Sladký, Karel
TI  - Growth rates and average optimality in risk-sensitive Markov decision chains
JO  - Kybernetika
PY  - 2008
SP  - 205
EP  - 226
VL  - 44
IS  - 2
UR  - http://geodesic.mathdoc.fr/item/KYB_2008_44_2_a5/
LA  - en
ID  - KYB_2008_44_2_a5
ER  - 
%0 Journal Article
%A Sladký, Karel
%T Growth rates and average optimality in risk-sensitive Markov decision chains
%J Kybernetika
%D 2008
%P 205-226
%V 44
%N 2
%U http://geodesic.mathdoc.fr/item/KYB_2008_44_2_a5/
%G en
%F KYB_2008_44_2_a5
Sladký, Karel. Growth rates and average optimality in risk-sensitive Markov decision chains. Kybernetika, Tome 44 (2008) no. 2, pp. 205-226. http://geodesic.mathdoc.fr/item/KYB_2008_44_2_a5/

[1] Bellman R.: On a quasi-linear equation. Canadian J. Math. 8 (1956), 198–202 | MR | Zbl

[2] Bellman R.: Dynamic Programming. Princenton Univ. Press, Princenton, N.J. 1957 | MR | Zbl

[3] Berman A., Plemmons R. J.: Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York 1979 | MR | Zbl

[4] Bielecki T. D., Hernández-Hernández, D., Pliska S. R.: Risk-sensitive control of finite state Markov chains in discrete time, with application to portfolio management. Math. Methods Oper. Res. 50 (1999), 167–188 | MR

[5] Cavazos-Cadena R.: Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space. Math. Methods Oper. Res. 57 (2003), 253–285 | MR | Zbl

[6] Cavazos-Cadena R., Montes-de-Oca R.: The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Math. Oper. Res. 28 (2003), 752–756 | MR | Zbl

[7] Cavazos-Cadena R., Montes-de-Oca R.: Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion. J. Appl. Probab. 42 (2005), 905–918 | MR | Zbl

[8] Cavazos-Cadena R., Hernández-Hernández D.: Solution fo the risk-sensitive average optimality equation in communicating Markov decision chains with finite state space: An alternative approach. Math. Methods Oper. Res. 56 (2002), 473–479 | MR

[9] Cavazos-Cadena R., Hernández-Hernández D.: A characterization exponential functionals in finite Markov chains. Math. Methods Oper. Res. 60 (2004), 399–414 | MR

[10] Cavazos-Cadena R., Hernández-Hernández D.: A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains. Ann. Appl. Probab. 15 (2005), 175–212 | MR | Zbl

[11] Gantmakher F. R.: The Theory of Matrices. Chelsea, London 1959

[12] Howard R. A., Matheson J.: Risk-sensitive Markov decision processes. Manag. Sci. 23 (1972), 356–369 | MR | Zbl

[13] Jaquette S. A.: A utility criterion for Markov decision processes. Manag. Sci. 23 (1976), 43–49 | MR | Zbl

[14] Mandl P.: Decomposable non-negative matrices in a dynamic programming problem. Czechoslovak Math. J. 20 (1970), 504–510 | MR | Zbl

[15] Rothblum U. G., Whittle P.: Growth optimality for branching Markov decision chains. Math. Oper. Res. 7 (1982), 582–601 | MR | Zbl

[16] Sladký K.: On dynamic programming recursions for multiplicative Markov decision chains. Math. Programming Study 6 (1976), 216–226 | MR | Zbl

[17] Sladký K.: Successive approximation methods for dynamic programming models. In: Proc. of the Third Formator Symposium on the Analysis of Large-Scale Systems (J. Beneš and L. Bakule, eds.). Academia, Prague 1979, pp. 171–189 | Zbl

[18] Sladký K.: Bounds on discrete dynamic programming recursions I. Kybernetika 16 (1980), 526–547 | MR | Zbl

[19] Whittle P.: Optimization Over Time – Dynamic Programming and Stochastic Control. Volume II, Chapter 35, Wiley, Chichester 1983 | MR

[20] Zijm W. H. M.: Nonnegative Matrices in Dynamic Programming. Mathematical Centre Tract, Amsterdam 1983 | MR | Zbl