Invariant description of control in a Gaussian one-armed bandit problem
Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematičeskoe modelirovanie i programmirovanie, Tome 17 (2024) no. 1, pp. 27-36 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

We consider the one-armed bandit problem in application to batch data processing if there are two alternative processing methods with different efficiencies and the efficiency of the second method is a priori unknown. During the processing, it is necessary to determine the most effective method and ensure its preferential use. Processing is performed in batches, so the distributions of incomes are Gaussian. We consider the case of a priori unknown mathematical expectation and the variance of income corresponding to the second action. This case describes a situation when the batches themselves and their number have moderate or small volumes. We obtain recursive equations for computing the Bayesian risk and regret, which we then present in an invariant form with a control horizon equal to one. This makes it possible to obtain the estimates of Bayesian and minimax risk that are valid for all control horizons multiples to the number of processed batches.
Keywords: one-armed bandit, batch processing, Bayesian and minimax approaches
Mots-clés : invariant description.
@article{VYURU_2024_17_1_a2,
     author = {A. V. Kolnogorov},
     title = {Invariant description of control in a {Gaussian} one-armed bandit problem},
     journal = {Vestnik \^U\v{z}no-Uralʹskogo gosudarstvennogo universiteta. Seri\^a, Matemati\v{c}eskoe modelirovanie i programmirovanie},
     pages = {27--36},
     year = {2024},
     volume = {17},
     number = {1},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/VYURU_2024_17_1_a2/}
}
TY  - JOUR
AU  - A. V. Kolnogorov
TI  - Invariant description of control in a Gaussian one-armed bandit problem
JO  - Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematičeskoe modelirovanie i programmirovanie
PY  - 2024
SP  - 27
EP  - 36
VL  - 17
IS  - 1
UR  - http://geodesic.mathdoc.fr/item/VYURU_2024_17_1_a2/
LA  - en
ID  - VYURU_2024_17_1_a2
ER  - 
%0 Journal Article
%A A. V. Kolnogorov
%T Invariant description of control in a Gaussian one-armed bandit problem
%J Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematičeskoe modelirovanie i programmirovanie
%D 2024
%P 27-36
%V 17
%N 1
%U http://geodesic.mathdoc.fr/item/VYURU_2024_17_1_a2/
%G en
%F VYURU_2024_17_1_a2
A. V. Kolnogorov. Invariant description of control in a Gaussian one-armed bandit problem. Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematičeskoe modelirovanie i programmirovanie, Tome 17 (2024) no. 1, pp. 27-36. http://geodesic.mathdoc.fr/item/VYURU_2024_17_1_a2/

[1] Berry D.A., Fristedt B., Bandit Problems: Sequential Allocation of Experiments, Chapman and Hall, London–New York, 1985 | MR | Zbl

[2] Presman E.L., Sonin I.M., Sequential Control with Incomplete Information, Academic Press, New York, 1990 | MR | Zbl

[3] Tsetlin M.L., Automaton Theory and Modeling of Biological Systems, Academic Press, New York, 1973 | MR

[4] Sragovich V.G., Mathematical Theory of Adaptive Control, World Scientific, Singapore, 2006 | MR | Zbl

[5] Gittins J.C., Multi-Armed Bandit Allocation Indices, John Wiley and Sons, Chichester, 1989 | MR | Zbl

[6] Lattimore T., Szepesvari C., Bandit Algorithms, Cambridge University Press, Cambridge, 2020 | Zbl

[7] Kolnogorov A.V., “One-Armed Bandit Problem for Parallel Data Processing Systems”, Problems of Information Transmission, 51:2 (2015), 177–191 | DOI | MR | Zbl

[8] Perchet V., Rigollet P., Chassang S., Snowberg E., “Batched Bandit Problems”, The Annals of Statistics, 44:2 (2016), 660–681 | DOI | MR | Zbl

[9] Vogel W., “An Asymptotic Minimax Theorem for the Two-Armed Bandit Problem”, The Annals of Mathematical Statistics, 31:2 (1960), 444–451 | DOI | MR | Zbl

[10] Kolnogorov A., “Gaussian One-Armed Bandit Problem”, 2021 XVII International Symposium “Problems of Redundancy in Information and Control Systems”, Institute of Electrical and Electronics Engineers, M., 2021, 74–79 | DOI | MR

[11] Bradt R.N., Johnson S.M., Karlin S., “On Sequential Designs for Maximizing the Sum of $n$ Observations”, The Annals of Mathematical Statistics, 27 (1956), 1060–1074 | DOI | MR | Zbl

[12] Chernoff H., Ray S.N., “A Bayes Sequential Sampling Inspection Plan”, The Annals of Mathematical Statistics, 36 (1965), 1387–1407 | DOI | MR | Zbl

[13] Kolnogorov A.V., “Gaussian One-Armed Bandit with Both Unknown Parameters”, Siberian Electronic Mathematical Reports, 19:2 (2022), 639–650 http://semr.math.nsc.ru/v19n2ru.html | MR