Turnpikes in finite Markov decision processes and random walk
Teoriâ veroâtnostej i ee primeneniâ, Tome 68 (2023) no. 1, pp. 147-176 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

In this paper we revise the theory of turnpikes in discounted Markov decision processes, prove the turnpike theorem for the undiscounted model, and apply the results to the specific random walk.
Keywords: turnpike, Markov decision process, discounted reward, average reward, random walk, stochastic knapsack problem.
@article{TVP_2023_68_1_a8,
     author = {A. B. Piunovskiy},
     title = {Turnpikes in finite {Markov} decision processes and random walk},
     journal = {Teori\^a vero\^atnostej i ee primeneni\^a},
     pages = {147--176},
     year = {2023},
     volume = {68},
     number = {1},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/TVP_2023_68_1_a8/}
}
TY  - JOUR
AU  - A. B. Piunovskiy
TI  - Turnpikes in finite Markov decision processes and random walk
JO  - Teoriâ veroâtnostej i ee primeneniâ
PY  - 2023
SP  - 147
EP  - 176
VL  - 68
IS  - 1
UR  - http://geodesic.mathdoc.fr/item/TVP_2023_68_1_a8/
LA  - ru
ID  - TVP_2023_68_1_a8
ER  - 
%0 Journal Article
%A A. B. Piunovskiy
%T Turnpikes in finite Markov decision processes and random walk
%J Teoriâ veroâtnostej i ee primeneniâ
%D 2023
%P 147-176
%V 68
%N 1
%U http://geodesic.mathdoc.fr/item/TVP_2023_68_1_a8/
%G ru
%F TVP_2023_68_1_a8
A. B. Piunovskiy. Turnpikes in finite Markov decision processes and random walk. Teoriâ veroâtnostej i ee primeneniâ, Tome 68 (2023) no. 1, pp. 147-176. http://geodesic.mathdoc.fr/item/TVP_2023_68_1_a8/

[1] Kai Chen, S. M. Ross, “An adaptive stochastic knapsack problem”, European J. Oper. Res., 239:3 (2014), 625–635 | DOI | MR | Zbl

[2] B. C. Dean, M. X. Goemans, J. Vondrák, “Approximating the stochastic knapsack problem: the benefit of adaptivity”, Math. Oper. Res., 33:4 (2008), 945–964 | DOI | MR | Zbl

[3] E. V. Denardo, U. G. Rothblum, “A turnpike theorem for a risk-sensitive Markov decision process with stopping”, SIAM J. Control Optim., 45:2 (2006), 414–431 | DOI | MR | Zbl

[4] R. Dorfman, P. A. Samuelson, R. M. Solow, Linear programming and economic analysis, McGraw-Hill Book Co., Inc., New York–Toronto–London, 1958, ix+525 pp. | MR | Zbl

[5] E. B. Dynkin, A. A. Yushkevich, Controlled Markov processes, Grundlehren Math. Wiss., 235, Springer-Verlag, Berlin–New York, 1979, xvii+289 pp. | MR | MR | Zbl

[6] A. Federgruen, P. J. Schweitzer, “Discounted and undiscounted value-iteration in Markov decision problems: a survey”, Dynamic programming and its applications (Univ. British Columbia, Vancouver, BC, 1977), Academic Press, New York–London, 1978, 23–52 | DOI | MR | Zbl

[7] O. Hernández-Lerma, J. B. Lasserre, “A forecast horizon and a stopping rule for general Markov decision processes”, J. Math. Anal. Appl., 132:2 (1988), 388–400 | DOI | MR | Zbl

[8] K. Hinderer, K.-H. Waldmann, “Algorithms for countable state Markov decision models with an absorbing set”, SIAM J. Control Optim., 43:6 (2005), 2109–2131 | DOI | MR | Zbl

[9] R. A. Howard, Dynamic programming and Markov processes, The Technology Press of M.I.T., Cambridge, MA; John Wiley Sons, Inc., New York–London, 1960, viii+136 pp. | MR | MR | Zbl | Zbl

[10] T. Iida, M. Mori, “Markov decision processes with random horizon”, J. Oper. Res. Soc. Japan, 39:4 (1996), 592–603 | DOI | MR | Zbl

[11] M. Jacobson, N. Shimkin, A. Shwartz, “Markov decision processes with slow scale periodic decisions”, Math. Oper. Res., 28:4 (2003), 777–800 | DOI | MR | Zbl

[12] L. C. M. Kallenberg, Markov decision processes, Lecture notes, Univ. of Leiden, 2010, x+434 pp. https://www.math.leidenuniv.nl/~spieksma/colleges/LNMB-MDP/Lecture-notes-Kallenberg.pdf

[13] J. G. Kemeny, J. L. Snell, Finite Markov chains, Univ. Ser. Undergrad. Math., Van Nostrand Co., Inc., Princeton, NJ–Toronto–London–New York, 1960, viii+210 pp. | MR | MR | Zbl | Zbl

[14] V. N. Kolokoltsov, “Magistrali i beskonechnye ekstremali v markovskikh protsessakh prinyatiya reshenii”, Matem. zametki, 46:4 (1989), 118–120 | MR | Zbl

[15] M. E. Lewis, A. Paul, “Uniform turnpike theorems for finite Markov decision processes”, Math. Oper. Res., 44:4 (2019), 1145–1160 | DOI | MR | Zbl

[16] T. E. Morton, “Decision horizons in discrete time undiscounted Markov renewal programming”, IEEE Trans. Systems Man Cybernet., SMC-4:4 (1974), 392–394 | DOI | MR | Zbl

[17] A. R. Odoni, “On finding the maximal gain for Markov decision processes”, Operations Res., 17:5 (1969), 857–860 | DOI | MR | Zbl

[18] A. B. Piunovskiy, Optimal control of random sequences in problems with constraints, Math. Appl., 410, Kluwer Acad. Publ., Dordrecht, 1997, xii+345 pp. | DOI | MR | Zbl

[19] A. B. Piunovskiy, Examples in Markov decision processes, Imp. Coll. Press Optim. Ser., 2, Imp. Coll. Press, London, 2013, xiv+293 pp. | DOI | MR | Zbl

[20] A. B. Piunovskiy, “Controlled random walk: conjecture and counter-example”, Modern trends in controlled stochastic processes, v. 3, Emerg. Complex. Comput., 41, Theory and applications, Springer, Cham, 2021, 38–56 | DOI | MR | Zbl

[21] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming, Wiley Ser. Probab. Math. Statist. Appl. Probab. Statist., John Wiley Sons, Inc., New York, 1994, xx+649 pp. | DOI | MR | Zbl

[22] J. F. Shapiro, “Turnpike planning horizons for a Markovian decision model”, Management Sci., 14:5 (1968), 292–300 | DOI | Zbl

[23] A. J. Zaslavski, Turnpike properties in the calculus of variations and optimal control, Nonconvex Optim. Appl., 80, Springer, New York, 2006, xxii+395 pp. | DOI | MR | Zbl

[24] A. J. Zaslavski, Turnpike theory for the Robinson–Solow–Srinivasan model, Springer Optim. Appl., 166, Springer, Cham, 2020, x+442 pp. | DOI | MR | Zbl