Mean-variance optimality for semi-Markov decision processes under first passage criteria
Kybernetika, Tome 53 (2017) no. 1, pp. 59-81
Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

This paper deals with a first passage mean-variance problem for semi-Markov decision processes in Borel spaces. The goal is to minimize the variance of a total discounted reward up to the system's first entry to some target set, where the optimization is over a class of policies with a prescribed expected first passage reward. The reward rates are assumed to be possibly unbounded, while the discount factor may vary with states of the system and controls. We first develop some suitable conditions for the existence of first passage mean-variance optimal policies and provide a policy improvement algorithm for computing an optimal policy. Then, two examples are included to illustrate our results. At last, we show how the results here are reduced to the cases of discrete-time Markov decision processes and continuous-time Markov decision processes.
This paper deals with a first passage mean-variance problem for semi-Markov decision processes in Borel spaces. The goal is to minimize the variance of a total discounted reward up to the system's first entry to some target set, where the optimization is over a class of policies with a prescribed expected first passage reward. The reward rates are assumed to be possibly unbounded, while the discount factor may vary with states of the system and controls. We first develop some suitable conditions for the existence of first passage mean-variance optimal policies and provide a policy improvement algorithm for computing an optimal policy. Then, two examples are included to illustrate our results. At last, we show how the results here are reduced to the cases of discrete-time Markov decision processes and continuous-time Markov decision processes.
DOI : 10.14736/kyb-2017-1-0059
Classification : 60J27, 90C40
Keywords: semi-Markov decision processes; first passage time; unbounded reward rate; minimal variance; mean-variance optimal policy
@article{10_14736_kyb_2017_1_0059,
     author = {Huang, Xiangxiang and Huang, Yonghui},
     title = {Mean-variance optimality for {semi-Markov} decision processes under first passage criteria},
     journal = {Kybernetika},
     pages = {59--81},
     year = {2017},
     volume = {53},
     number = {1},
     doi = {10.14736/kyb-2017-1-0059},
     mrnumber = {3638556},
     zbl = {06738594},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.14736/kyb-2017-1-0059/}
}
TY  - JOUR
AU  - Huang, Xiangxiang
AU  - Huang, Yonghui
TI  - Mean-variance optimality for semi-Markov decision processes under first passage criteria
JO  - Kybernetika
PY  - 2017
SP  - 59
EP  - 81
VL  - 53
IS  - 1
UR  - http://geodesic.mathdoc.fr/articles/10.14736/kyb-2017-1-0059/
DO  - 10.14736/kyb-2017-1-0059
LA  - en
ID  - 10_14736_kyb_2017_1_0059
ER  - 
%0 Journal Article
%A Huang, Xiangxiang
%A Huang, Yonghui
%T Mean-variance optimality for semi-Markov decision processes under first passage criteria
%J Kybernetika
%D 2017
%P 59-81
%V 53
%N 1
%U http://geodesic.mathdoc.fr/articles/10.14736/kyb-2017-1-0059/
%R 10.14736/kyb-2017-1-0059
%G en
%F 10_14736_kyb_2017_1_0059
Huang, Xiangxiang; Huang, Yonghui. Mean-variance optimality for semi-Markov decision processes under first passage criteria. Kybernetika, Tome 53 (2017) no. 1, pp. 59-81. doi: 10.14736/kyb-2017-1-0059

[1] Berument, H., Kilinc, Z., Ozlale, U.: The effects of different inflation risk premiums on interest rate spreads. Phys. A 333 (2004), 317-324. | DOI | MR

[2] Baykal-Gürsoy, M., Gürsoy, K.: Semi-Markov decision processes: nonstandard criteria. Probab. Engrg. Inform. Sci. 21 (2007), 635-657. | DOI | MR

[3] Bäuerle, N., Rieder, U.: Markov decision processes with applications to finance. In: Universitext, Springer, Heidelberg 2011. | DOI | MR | Zbl

[4] Collins, E.: Finite-horizon variance penalised Markov decision processes. OR Spektrum 19 (1997), 35-39. | DOI | MR | Zbl

[5] Costa, O. L. V., Maiali, A. C., Pinto, A. de C.: Sampled control for mean-variance hedging in a jump diffusion financial market. IEEE Trans. Automat. Control 55 (2010), 1704-1709. | DOI | MR

[6] Filar, J. A., Kallenberg, L. C. M., Lee, H. M.: Variance-penalized Markov decision processes. Math. Oper. Res. 14 (1989), 147-161. | DOI | MR | Zbl

[7] Fu, C. P., Lari-Lavassani, A., Li, X.: Dynamic mean-variance portfolio selection with borrowing constraint. European J. Oper. Res. 200 (2010), 312-319. | DOI | MR | Zbl

[8] Guo, X. P., Hernández-Lerma, O.: Continuous-Time Markov Decision Processes: Theory and Applications. Springer-Verlag, Berlin 2009. | DOI | MR | Zbl

[9] Guo, X. P., Song, X. Y.: Mean-variance criteria for finite continuous-time Markov decision processes. IEEE Trans. Automat. Control 54 (2009), 2151-2157. | DOI | MR

[10] Guo, X. P., Ye, L. E., Yin, G.: A mean-variance optimization problem for discounted Markov decision processes. European J. Oper. Res. 220 (2012), 423-429. | DOI | MR | Zbl

[11] Guo, X. P., Huang, X. X., Zhang, Y.: On the first passage $g$-mean variance optimality for discounted continuous-time Markov decision processes. SIAM J. Control Optim. 53 (2015), 1406-1424. | DOI | MR | Zbl

[12] Hu, Q. Y.: Continuous time Markov decision processes with discounted moment criterion. J. Math. Anal. Appl. 203 (1996), 1-12. | DOI | MR | Zbl

[13] Hernández-Lerma, O., Lasserre, J. B.: Further Topics on Discrete-Time Markov Control Processes. Springer-Verlag, New York 1999. | DOI | MR | Zbl

[14] Hernández-Lerma, O., Vega-Amaya, O., Carrasco, G.: Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optim. 38 (1999), 79-93. | DOI | MR | Zbl

[15] Haberman, S., Sung, J. H.: Optimal pension funding dynamics over infinite control horizon when stochastic rates of return are stationary. Insurance Math. Econom. 36 (2005), 103-116. | DOI | MR | Zbl

[16] Huang, Y. H., Guo, X. P.: First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta Math. Appl. Sin. Engl. Ser. 27 (2011), 177-190. | DOI | MR | Zbl

[17] Huang, Y. H., Guo, X. P., Song, X. Y.: Performance analysis for controlled semi-Markov systems with application to maintenance. J. Optim. Theory Appl. 150 (2011), 395-415. | DOI | MR | Zbl

[18] Huang, Y. H., Guo, X. P.: Constrained optimality for first passage criteria in semi-Markov decision processes. Optimization, Control, and Applications of Stochastic Systems, pp. 181-202, Systems Control Found. Appl., Birkhäuser/Springer, New York 2012. | DOI | MR

[19] Huang, Y. H., Guo, X. P.: Mean-variance problems for finite horizon semi-Markov decision processes. Appl. Math. Optim. 72 (2015), 233-259. | DOI | MR | Zbl

[20] Jaquette, S. C.: Markov decision processes with a new optimality criterion: continuous time. Ann. Statist. 3 (1975), 547-553. | DOI | MR | Zbl

[21] Kurano, M.: Markov decision processes with a minimum-variance criterion. J. Math. Anal. Appl. 123 (1987), 572-583. | DOI | MR | Zbl

[22] Kharroubi, I., Lim, T.: A. Ngoupeyou, Mean-variance hedging on uncertain time horizon in a market with a jump. Appl. Math. Optim. 68 (2013), 413-444. | DOI | MR

[23] Lee, M. J., Li, W. J.: Drift and diffusion function specification for short-term interest rates. Econom. Lett. 86 (2005), 339-346. | DOI | MR | Zbl

[24] Mandl, P.: On the variance in controlled Markov chains. Kybernetika 7 (1971), 1-12. | MR | Zbl

[25] Mannor, S., Tsitsiklis, J. N.: Algorithmic aspects of mean-variance optimization in Markov decision processes. European J. Oper. Res. 231 (2013), 645-653. | DOI | MR | Zbl

[26] Markowitz, H. M.: Portfolio Selection: Efficient Diversification of Investments. John Wiley and Sons, Inc., New York 1959. | MR

[27] Prieto-Rumeau, T., Hernández-Lerma, O.: Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains. Math. Methods Oper. Res. 70 (2009), 527-540. | DOI | MR | Zbl

[28] Sobel, M. J.: The variance of discounted Markov decision processes. J. Appl. Probab. 19 (1982), 794-802. | DOI | MR | Zbl

[29] White, D. J.: Computational approaches to variance-penalised Markov decision processes. OR Spektrum 14 (1992), 79-83. | DOI | MR | Zbl

[30] Wu, X., Guo, X. P.: First passage optimality and variance minimisation of Markov decision processes with varying discount factors. J. Appl. Probab. 52 (2015), 441-456. | DOI | MR | Zbl

[31] Zhou, X. Y., Yin, G.: Markowitz's mean-variance portfolio selection with regime switching: a continuous-time model. SIAM J. Control Optim. 42 (2003), 1466-1482. | DOI | MR | Zbl

[32] Zhu, Q. X., Guo, X. P.: Markov decision processes with variance minimization: a new condition and approach. Stoch. Anal. Appl. 25 (2007), 577-592. | DOI | MR | Zbl

Cité par Sources :