On a class of policies in general Markov decision models
Teoriâ veroâtnostej i ee primeneniâ, Tome 18 (1973) no. 4, pp. 815-817
Voir la notice de l'article provenant de la source Math-Net.Ru
The paper studies stationary policies which, under some final reward, become optimal on each time interval $[0, n]$ and provide a total gain linearly dependent on $n$. Necessary and sufficient conditions for the existence of such policies are given in the form of equations (4), (5). These equations appeared previously in various cases as sufficient optimality conditions for the average-per-unit-time criterion.
@article{TVP_1973_18_4_a11,
author = {A. A. Yushkevich},
title = {On a class of policies in general {Markov} decision models},
journal = {Teori\^a vero\^atnostej i ee primeneni\^a},
pages = {815--817},
publisher = {mathdoc},
volume = {18},
number = {4},
year = {1973},
language = {ru},
url = {http://geodesic.mathdoc.fr/item/TVP_1973_18_4_a11/}
}
A. A. Yushkevich. On a class of policies in general Markov decision models. Teoriâ veroâtnostej i ee primeneniâ, Tome 18 (1973) no. 4, pp. 815-817. http://geodesic.mathdoc.fr/item/TVP_1973_18_4_a11/