Gittins index for simple family of Markov bandit processes with switching cost and no discounting
Teoriâ veroâtnostej i ee primeneniâ, Tome 64 (2019) no. 3, pp. 442-455

Voir la notice de l'article provenant de la source Math-Net.Ru

We consider the multiarmed bandit problem (the problem of Markov bandits) with switching penalties and no discounting in case when state spaces of all bandits are finite. An optimal strategy should have the largest average reward per unit time on an infinite time horizon. For this problem it is shown that an optimal strategy can be specified by a Gittins index under the natural assumption that the switching penalties are nonnegative.
Keywords: multicomponent systems, Gittins index, simple family of alternative Markov bandit processes, multiarmed bandit problem, Markov decision process, controlled Markov processes, long run average return, no discounting, switching penalties, optimal strategy.
@article{TVP_2019_64_3_a1,
     author = {M. P. Savelov},
     title = {Gittins index for simple family of {Markov} bandit processes with switching cost and no discounting},
     journal = {Teori\^a vero\^atnostej i ee primeneni\^a},
     pages = {442--455},
     publisher = {mathdoc},
     volume = {64},
     number = {3},
     year = {2019},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/TVP_2019_64_3_a1/}
}
TY  - JOUR
AU  - M. P. Savelov
TI  - Gittins index for simple family of Markov bandit processes with switching cost and no discounting
JO  - Teoriâ veroâtnostej i ee primeneniâ
PY  - 2019
SP  - 442
EP  - 455
VL  - 64
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/TVP_2019_64_3_a1/
LA  - ru
ID  - TVP_2019_64_3_a1
ER  - 
%0 Journal Article
%A M. P. Savelov
%T Gittins index for simple family of Markov bandit processes with switching cost and no discounting
%J Teoriâ veroâtnostej i ee primeneniâ
%D 2019
%P 442-455
%V 64
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/TVP_2019_64_3_a1/
%G ru
%F TVP_2019_64_3_a1
M. P. Savelov. Gittins index for simple family of Markov bandit processes with switching cost and no discounting. Teoriâ veroâtnostej i ee primeneniâ, Tome 64 (2019) no. 3, pp. 442-455. http://geodesic.mathdoc.fr/item/TVP_2019_64_3_a1/