Voir la notice de l'article provenant de la source Math-Net.Ru
@article{MGTA_2023_15_3_a4, author = {Dmitry N. Shiyan}, title = {One-armed bandit problem and the mirror descent algorithm}, journal = {Matemati\v{c}eska\^a teori\^a igr i e\"e prilo\v{z}eni\^a}, pages = {88--106}, publisher = {mathdoc}, volume = {15}, number = {3}, year = {2023}, language = {ru}, url = {http://geodesic.mathdoc.fr/item/MGTA_2023_15_3_a4/} }
Dmitry N. Shiyan. One-armed bandit problem and the mirror descent algorithm. Matematičeskaâ teoriâ igr i eë priloženiâ, Tome 15 (2023) no. 3, pp. 88-106. http://geodesic.mathdoc.fr/item/MGTA_2023_15_3_a4/
[1] Borovkov A.A., Matematicheskaya statistika. Dopolnitelnye glavy, Uchebnoe posobie dlya vuzov, Nauka. Glavnaya redaktsiya fiziko-matematicheskoi literatury, M., 1984
[2] Varshavskii V.I., Kollektivnoe povedenie avtomatov, Nauka, M., 1973
[3] Kolnogorov A.V., “Zadacha ob odnorukom bandite dlya sistem s parallelnoi obrabotkoi dannykh”, Probl. peredachi inform., 51:2 (2015), 9–113 | MR
[4] Kolnogorov A.V., Nazin A.V., Shiyan D.N., “Zadacha o dvurukom bandite i paketnaya versiya AZS”, MTIP, 13:2 (2021), 9–39
[5] Nazin A.V., Poznyak A.S., Adaptivnyi vybor variantov, Nauka, M., 1986
[6] Nemirovskii A.S., Yudin D.B., “Effektivnye metody resheniya zadach vypuklogo programmirovaniya bolshoi razmernosti”, Ekonomika i matem. metody, 15:1 (1979), 135–152 | MR | Zbl
[7] Presman E.L., Sonin I.M., Posledovatelnoe upravlenie po nepolnym dannym, Nauka, M., 1982
[8] Satton R.S., Barto E.G., Obuchenie s podkrepleniem, DMK Press, M., 2020
[9] Sragovich V.G., Adaptivnoe upravlenie, Nauka, M., 1981
[10] Tsetlin M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Nauka, M., 1969
[11] Berry D.A., Fristedt B., Bandit Problems: Sequential Allocation of Experiments, Chapman and Hall, London–New York, 1985 | MR | Zbl
[12] Bradt R.N., Johnson S.M., Karlin S., “On Sequential Designs for Maximizing the Sum of n Observations”, Ann. Math. Statist., 27:4 (1956), 1060–1074 | DOI | MR | Zbl