One-armed bandit problem and the mirror descent algorithm

Dmitry N. Shiyan

Dmitry N. Shiyan

Matematičeskaâ teoriâ igr i eë priloženiâ, Tome 15 (2023) no. 3, pp. 88-106 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

Résumé

We consider the application of the mirror descent algorithm (MDA) to the one-armed bandit problem in the minimax statement as applied to data processing. This problem is also known as the game with nature, where the player's payoff function is the mathematical expectation of the total income. The player must determine the most effective method of the two available and provide that it is predominantly used. In this case, the a priori efficiency of one of the methods is known. This article proposes a modification of the MDA that allows to improve the efficiency of control through the use of additional a priori information. The proposed strategy retains the characteristic property of strategies for one-armed bandits – if a known action is applied once, it will be applied until the end of the control. Modifications for the algorithm for one-by-one processing and for its batch version are considered. Batch processing is interesting in that the total processing time is determined by the number of batches and not the original amount of data, if it is possible to provide parallel processing of data in batches. For the proposed algorithms, using the Monte-Carlo simulation, the optimal values of the tunable parameters were calculated and the minimax risk estimates were obtained.

Keywords: two-armed bandit problem, one-armed bandit problem, minimax approach, mirror descent algorithm, batch processing.
Mots-clés : EXP3

@article{MGTA_2023_15_3_a4,
     author = {Dmitry N. Shiyan},
     title = {One-armed bandit problem and the mirror descent algorithm},
     journal = {Matemati\v{c}eska\^a teori\^a igr i e\"e prilo\v{z}eni\^a},
     pages = {88--106},
     year = {2023},
     volume = {15},
     number = {3},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MGTA_2023_15_3_a4/}
}

TY  - JOUR
AU  - Dmitry N. Shiyan
TI  - One-armed bandit problem and the mirror descent algorithm
JO  - Matematičeskaâ teoriâ igr i eë priloženiâ
PY  - 2023
SP  - 88
EP  - 106
VL  - 15
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/MGTA_2023_15_3_a4/
LA  - ru
ID  - MGTA_2023_15_3_a4
ER  -

%0 Journal Article
%A Dmitry N. Shiyan
%T One-armed bandit problem and the mirror descent algorithm
%J Matematičeskaâ teoriâ igr i eë priloženiâ
%D 2023
%P 88-106
%V 15
%N 3
%U http://geodesic.mathdoc.fr/item/MGTA_2023_15_3_a4/
%G ru
%F MGTA_2023_15_3_a4

Dmitry N. Shiyan. One-armed bandit problem and the mirror descent algorithm. Matematičeskaâ teoriâ igr i eë priloženiâ, Tome 15 (2023) no. 3, pp. 88-106. http://geodesic.mathdoc.fr/item/MGTA_2023_15_3_a4/

Bibliographie
Cité par

[1] Borovkov A.A., Matematicheskaya statistika. Dopolnitelnye glavy, Uchebnoe posobie dlya vuzov, Nauka. Glavnaya redaktsiya fiziko-matematicheskoi literatury, M., 1984

[2] Varshavskii V.I., Kollektivnoe povedenie avtomatov, Nauka, M., 1973

[3] Kolnogorov A.V., “Zadacha ob odnorukom bandite dlya sistem s parallelnoi obrabotkoi dannykh”, Probl. peredachi inform., 51:2 (2015), 9–113 | MR

[4] Kolnogorov A.V., Nazin A.V., Shiyan D.N., “Zadacha o dvurukom bandite i paketnaya versiya AZS”, MTIP, 13:2 (2021), 9–39

[5] Nazin A.V., Poznyak A.S., Adaptivnyi vybor variantov, Nauka, M., 1986

[6] Nemirovskii A.S., Yudin D.B., “Effektivnye metody resheniya zadach vypuklogo programmirovaniya bolshoi razmernosti”, Ekonomika i matem. metody, 15:1 (1979), 135–152 | MR | Zbl

[7] Presman E.L., Sonin I.M., Posledovatelnoe upravlenie po nepolnym dannym, Nauka, M., 1982

[8] Satton R.S., Barto E.G., Obuchenie s podkrepleniem, DMK Press, M., 2020

[9] Sragovich V.G., Adaptivnoe upravlenie, Nauka, M., 1981

[10] Tsetlin M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Nauka, M., 1969

[11] Berry D.A., Fristedt B., Bandit Problems: Sequential Allocation of Experiments, Chapman and Hall, London–New York, 1985 | MR | Zbl

[12] Bradt R.N., Johnson S.M., Karlin S., “On Sequential Designs for Maximizing the Sum of n Observations”, Ann. Math. Statist., 27:4 (1956), 1060–1074 | DOI | MR | Zbl

Parcourir par

Geodesic

Parcourir par