Learning the distribution with largest mean: two bandit frameworks

Emilie Kaufmann; Aurélien Garivier

doi:10.1051/proc/201760114

Emilie Kaufmann ; Aurélien Garivier

ESAIM. Proceedings, Tome 60 (2017), pp. 114-131

Cet article a éte moissonné depuis la source EDP Sciences

Voir la notice de l'article

Résumé

Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning which distribution has the highest mean among a set of distributions, with some constraints on the learning process. For both of them (regret minimization and best arm identification) we present recent, asymptotically optimal algorithms. We compare the behaviors of the sampling rule of each algorithm as well as the complexity terms associated to each problem.

DOI : 10.1051/proc/201760114

@article{EP_2017_60_a6,
     author = {Emilie Kaufmann and Aur\'elien Garivier},
     title = {Learning the distribution with largest mean: two bandit frameworks},
     journal = {ESAIM. Proceedings},
     pages = {114--131},
     year = {2017},
     volume = {60},
     doi = {10.1051/proc/201760114},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.1051/proc/201760114/}
}

TY  - JOUR
AU  - Emilie Kaufmann
AU  - Aurélien Garivier
TI  - Learning the distribution with largest mean: two bandit frameworks
JO  - ESAIM. Proceedings
PY  - 2017
SP  - 114
EP  - 131
VL  - 60
UR  - http://geodesic.mathdoc.fr/articles/10.1051/proc/201760114/
DO  - 10.1051/proc/201760114
LA  - en
ID  - EP_2017_60_a6
ER  -

%0 Journal Article
%A Emilie Kaufmann
%A Aurélien Garivier
%T Learning the distribution with largest mean: two bandit frameworks
%J ESAIM. Proceedings
%D 2017
%P 114-131
%V 60
%U http://geodesic.mathdoc.fr/articles/10.1051/proc/201760114/
%R 10.1051/proc/201760114
%G en
%F EP_2017_60_a6

Emilie Kaufmann; Aurélien Garivier. Learning the distribution with largest mean: two bandit frameworks. ESAIM. Proceedings, Tome 60 (2017), pp. 114-131. doi: 10.1051/proc/201760114

Cité par Sources :

Parcourir par

Geodesic

Parcourir par