Gittins index for simple family of Markov bandit processes with switching cost and no discounting
Teoriâ veroâtnostej i ee primeneniâ, Tome 64 (2019) no. 3, pp. 442-455
Voir la notice de l'article provenant de la source Math-Net.Ru
We consider the multiarmed bandit problem (the problem of Markov bandits)
with switching penalties and no discounting in case when state spaces of all bandits are finite.
An optimal strategy should have the largest average reward per unit time on
an infinite time horizon.
For this problem it is shown that an optimal strategy can be specified by a Gittins index
under the natural assumption that the switching penalties are nonnegative.
Keywords:
multicomponent systems, Gittins index,
simple family of alternative Markov bandit processes,
multiarmed bandit problem, Markov decision process, controlled Markov processes,
long run average return, no discounting, switching penalties,
optimal strategy.
@article{TVP_2019_64_3_a1,
author = {M. P. Savelov},
title = {Gittins index for simple family of {Markov} bandit processes with switching cost and no discounting},
journal = {Teori\^a vero\^atnostej i ee primeneni\^a},
pages = {442--455},
publisher = {mathdoc},
volume = {64},
number = {3},
year = {2019},
language = {ru},
url = {http://geodesic.mathdoc.fr/item/TVP_2019_64_3_a1/}
}
TY - JOUR AU - M. P. Savelov TI - Gittins index for simple family of Markov bandit processes with switching cost and no discounting JO - Teoriâ veroâtnostej i ee primeneniâ PY - 2019 SP - 442 EP - 455 VL - 64 IS - 3 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/TVP_2019_64_3_a1/ LA - ru ID - TVP_2019_64_3_a1 ER -
M. P. Savelov. Gittins index for simple family of Markov bandit processes with switching cost and no discounting. Teoriâ veroâtnostej i ee primeneniâ, Tome 64 (2019) no. 3, pp. 442-455. http://geodesic.mathdoc.fr/item/TVP_2019_64_3_a1/