UCB strategies and optimization of batch processing in a one-armed bandit problem
Matematičeskaâ teoriâ igr i eë priloženiâ, Tome 15 (2023) no. 4, pp. 3-27

Voir la notice de l'article provenant de la source Math-Net.Ru

We consider a Gaussian one-armed bandit problem, which arises when optimizing batch data processing if there are two alternative processing methods with a priori known efficiency of the first method. During processing, it is necessary to determine a more effective method and ensure its preferential use. This optimal control problem is interpreted as a game with nature. We investigate cases of known and a priori unknown variance of income corresponding to the second method. The control goal is considered in a minimax setting, and UCB strategies are used to ensure it. In all the studied cases, invariant descriptions of control on a horizon equal to one are obtained, which depend only on the number of batches into which the data is divided, but not on their full number. These descriptions allow us to determine approximately optimal parameters of strategies using Monte Carlo simulation. Numerical results show the high efficiency of the proposed UCB strategies.
Keywords: Gaussian one-armed bandit, minimax approach, UCB rule
Mots-clés : invariant description, Monte-Carlo simulations.
@article{MGTA_2023_15_4_a0,
     author = {Sergey V. Garbar and Alexander V. Kolnogorov and Alexey N. Lazutchenko},
     title = {UCB strategies and optimization of batch processing in a one-armed bandit problem},
     journal = {Matemati\v{c}eska\^a teori\^a igr i e\"e prilo\v{z}eni\^a},
     pages = {3--27},
     publisher = {mathdoc},
     volume = {15},
     number = {4},
     year = {2023},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MGTA_2023_15_4_a0/}
}
TY  - JOUR
AU  - Sergey V. Garbar
AU  - Alexander V. Kolnogorov
AU  - Alexey N. Lazutchenko
TI  - UCB strategies and optimization of batch processing in a one-armed bandit problem
JO  - Matematičeskaâ teoriâ igr i eë priloženiâ
PY  - 2023
SP  - 3
EP  - 27
VL  - 15
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MGTA_2023_15_4_a0/
LA  - ru
ID  - MGTA_2023_15_4_a0
ER  - 
%0 Journal Article
%A Sergey V. Garbar
%A Alexander V. Kolnogorov
%A Alexey N. Lazutchenko
%T UCB strategies and optimization of batch processing in a one-armed bandit problem
%J Matematičeskaâ teoriâ igr i eë priloženiâ
%D 2023
%P 3-27
%V 15
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MGTA_2023_15_4_a0/
%G ru
%F MGTA_2023_15_4_a0
Sergey V. Garbar; Alexander V. Kolnogorov; Alexey N. Lazutchenko. UCB strategies and optimization of batch processing in a one-armed bandit problem. Matematičeskaâ teoriâ igr i eë priloženiâ, Tome 15 (2023) no. 4, pp. 3-27. http://geodesic.mathdoc.fr/item/MGTA_2023_15_4_a0/