Gaussian one-armed bandit with both unknown parameters

A. V. Kolnogorov

A. V. Kolnogorov

Sibirskie èlektronnye matematičeskie izvestiâ, Tome 19 (2022) no. 2, pp. 639-650 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

Résumé

We consider the one-armed bandit problem as applied to data processing. We assume that there are two alternative processing methods and efficiency of the second method is a priory unknown. During control process, one has to determine if the second method is more efficient than the first one and to provide a primary application of the most efficient method. The essential feature of considered approach is that the data is processed in batches and cumulative incomes in batches are used for the control. If the sizes of batches are large enough then according to the central limit theorem incomes in batches are approximately Gaussian. Also if the sizes of batches are large, one can estimate the variances of incomes during the processing initial batches and then use these estimates for the control. However, for batches of moderate sizes it is reasonable to estimate unknown variances throughout the control process. This optimization problem is described by Gaussian one-armed bandit with both unknown parameters. Given a prior distribution of unknown parameters of the second action, we derive a recursive Bellman-type equation for determining corresponding Bayesian strategy and Bayesian risk. Minimax strategy and minimax risk are searched for according to the main theorem of the game theory as Bayesian ones corresponding to the worst-case prior distribution.

Keywords: one-armed bandit, Bayesian and minimax approaches, main theorem of the game theory, batch processing.

@article{SEMR_2022_19_2_a32,
     author = {A. V. Kolnogorov},
     title = {Gaussian one-armed bandit with both unknown parameters},
     journal = {Sibirskie \`elektronnye matemati\v{c}eskie izvesti\^a},
     pages = {639--650},
     year = {2022},
     volume = {19},
     number = {2},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/SEMR_2022_19_2_a32/}
}

TY  - JOUR
AU  - A. V. Kolnogorov
TI  - Gaussian one-armed bandit with both unknown parameters
JO  - Sibirskie èlektronnye matematičeskie izvestiâ
PY  - 2022
SP  - 639
EP  - 650
VL  - 19
IS  - 2
UR  - http://geodesic.mathdoc.fr/item/SEMR_2022_19_2_a32/
LA  - en
ID  - SEMR_2022_19_2_a32
ER  -

%0 Journal Article
%A A. V. Kolnogorov
%T Gaussian one-armed bandit with both unknown parameters
%J Sibirskie èlektronnye matematičeskie izvestiâ
%D 2022
%P 639-650
%V 19
%N 2
%U http://geodesic.mathdoc.fr/item/SEMR_2022_19_2_a32/
%G en
%F SEMR_2022_19_2_a32

A. V. Kolnogorov. Gaussian one-armed bandit with both unknown parameters. Sibirskie èlektronnye matematičeskie izvestiâ, Tome 19 (2022) no. 2, pp. 639-650. http://geodesic.mathdoc.fr/item/SEMR_2022_19_2_a32/

Bibliographie
Cité par

[1] D.A. Berry, B. Fristedt, Bandit problems. Sequential allocation of experiments, Chapman and Hall, London - New York, 1985 | MR | Zbl

[2] E.L. Presman, I.N. Sonin, Sequential control with incomplete information. The Bayesian approach to multi-armed bandit problems, Academic Press, Inc., London etc, 1990 | MR | Zbl

[3] T. Lattimore, C. Szepesvári, Bandit algorithms, Cambridge University Press, Cambridge, 2020 | Zbl

[4] M.L Tsetlin, Automaton theory and modelling of biological systems, Academic Press, London etc, 1973 | MR | Zbl

[5] V.G. Sragovich, Mathematical theory of adaptive control, World Sci., Hackensack, 2006 | MR | Zbl

[6] J.C. Gittins, Multi-armed bandit allocation indices, Wiley-Interscience Series in Systems and Optimization, John Wiley Sons, Chichester, 1989 | MR | Zbl

[7] R.N. Bradt, S.M. Johnson, S. Karlin, “On sequential designs for maximizing the sum of n observations”, Ann. Math. Stat., 27 (1956), 1060–1074 | DOI | MR | Zbl

[8] H. Chernoff, S.N. Ray, “A Bayes sequential sampling inspection plan”, Ann. Math. Stat., 36 (1965), 1387–1407 | DOI | MR | Zbl

[9] A.V. Kolnogorov, “One-armed bandit problem for parallel data processing systems”, Probl. Inf. Transm., 51:2 (2015), 177–191 | DOI | MR | Zbl

[10] A. Kolnogorov, “Gaussian one-armed bandit problem”, 2021 XVII International symposium Problems of redundancy in information and control systems (REDUNDANCY), 2021, 74–79

[11] T.L. Lai, B. Levin, H. Robbins, D. Siegmund, “Sequential medical trials”, Proc. Natl. Acad. Sci. USA, 77:6 (1980), 3135–3138 | DOI | MR | Zbl

[12] V. Perchet, P. Rigollet, S. Chassang, E. Snowberg, “Batched bandit problems”, Ann. Stat., 44:2 (2016), 660–681 | DOI | MR | Zbl

[13] A.V. Kolnogorov, Gaussian one-armed bandit and optimization of batch data processing, 2019, arXiv: 1901.08845 | MR

Parcourir par

Geodesic

Parcourir par