The existence of a stationary $\varepsilon$-optimal policy for a finite Markov chain
Teoriâ veroâtnostej i ee primeneniâ, Tome 23 (1978) no. 2, pp. 313-330
Cet article a éte moissonné depuis la source Math-Net.Ru
The existence of a stationary average reward $\varepsilon$-optimal policy is proved for discrete time Markov decision chains with finitely many states, compact sets of actions, continuous transition functions and upper semicontinuous reward functions.
@article{TVP_1978_23_2_a5,
author = {E. A. Faǐnberg},
title = {The existence of a~stationary $\varepsilon$-optimal policy for a~finite {Markov} chain},
journal = {Teori\^a vero\^atnostej i ee primeneni\^a},
pages = {313--330},
year = {1978},
volume = {23},
number = {2},
language = {ru},
url = {http://geodesic.mathdoc.fr/item/TVP_1978_23_2_a5/}
}
E. A. Faǐnberg. The existence of a stationary $\varepsilon$-optimal policy for a finite Markov chain. Teoriâ veroâtnostej i ee primeneniâ, Tome 23 (1978) no. 2, pp. 313-330. http://geodesic.mathdoc.fr/item/TVP_1978_23_2_a5/